Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

HPC System Management: Scheduling to Optimize Infrastructure

Whether the application is floating-point intensive, integer based, uses a lot of memory, has significant I/O requirements, or its widespread use is limited by purchased licenses, a system that assigns the right job to the right server is key to maximizing the computing infrastructure. We continue our insideHPC series of features exploring new resource management solutions for workload convergence, such as Bright Cluster Manager by Bright Computing. This article discusses how scheduling can work to optimize infrastructure and improve HPC system management. 

Google Cloud TPU Machine Learning Accelerators now in Beta

John Barrus writes that Cloud TPUs are available in beta on Google Cloud Platform to help machine learning experts train and run their ML models more quickly. “Cloud TPUs are a family of Google-designed hardware accelerators that are optimized to speed up and scale up specific ML workloads programmed with TensorFlow. Built with four custom ASICs, each Cloud TPU packs up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board.

A Closer Look at Meltdown and Spectre

In this video, Mark Handley will explain what modern CPUs actually do to go fast, discuss how this leads to the Meltdown and Spectre vulnerabilities, and summarize the mitigations that are being put in place. “Operating systems and hypervisors need significant changes to how memory management is performed, CPU firmware needs updating, compilers are being modified to avoid risky instruction sequences, and browsers are being patched to prevent scripts having access to accurate time.”

Supercomputing the Origin of Mass

In this video, Professor Derek Leinweber from the University of Adelaide presents his research in Lattice Quantum Field Theory; revealing the origin of mass in the universe. “While the fundamental interactions are well understood, elucidating the complex phenomena emerging from this quantum field theory is fascinating and often surprising. My explorations of QCD-vacuum structure featured in Professor Wilczek’s 2004 Physics Nobel Prize Lecture. Our approach to discovering the properties of this key component of the Standard Model of the Universe favors fundamental first-principles numerical simulations of QCD on supercomputers. This field of study is commonly referred to as Lattice QCD.”

Inside SATURNV – Insights from NVIDIA’s Deep Learning Supercomputer

Phil Rogers from NVIDIA gave this talk at SC17. “Like its namesake, In this talk, we describe the architecture of SATURNV, and how we use it every day at NVIDIA to run our deep learning workloads for both production and research use cases. We explore how the NVIDIA GPU Cloud software is used to manage and schedule work on SATURNV, and how it gives us the agility to rapidly respond to business-critical projects. We also present some of the results of our research in operating this unique GPU-accelerated data center.”

Sylabs Startup forms Commercial Entity behind Singularity for HPC

Today an HPC Startup called Sylabs entered the market to provide solutions and services based on Singularity, an open source container technology designed for high performance computing. Founded by the inventor and project lead for Singularity, Sylabs will license and support Singularity Pro, an enterprise version of the software, and introduce it to businesses in the enterprise and HPC commercial markets.

Addressing Fault Tolerance and Data Compression at Exascale

In this Let’s Talk Exascale podcast, Franck Cappello from Argonne National Laboratory describes the VeloC project. “The VeloC project endeavors to provide ECP applications an optimal fault-tolerance environment, while the aim of the EZ project is to provide data reduction. Interviewee: Franck Cappello, Argonne National Laboratory.”

Video: Lustre Generational Performance Improvements & New Features

Adam Roe from Intel gave this talk at LAD’17 in Paris. “Lustre has had a number of compelling new features added in recent releases; this talk will look at those features in detail and see how well they all work together from both a performance and functionality perspective. Comparing some of the numbers from last year we will see how far the Lustre* filesystem has come in such a short period of time (LAD’16 to LAD’17), comparing the same use cases observing the generational improvements in the technology.”

Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks

Jon Masters from Red Hat gave this Keynote at FOSDEM 2018. “Recently disclosed vulnerabilities against modern high performance computer microarchitectures known as ‘Meltdown’ and ‘Spectre’ are among an emerging wave of hardware-focused attacks. This talk will describe several of these attacks, how they can be mitigated, and generally what we can do as an industry to bring performance without trading security.”

How Deep Learning is Causing a ‘Seismic Shift’ in the Retail Industry

 These days, taking big data further and integrating machine and deep learning means having a competitive edge in the retail industry. Analytics — and their analysis — have always been a cornerstone of retail success. And now, deep learning techniques are going to push the data boom to the next level. A new insideHPC special report, courtesy of Dell EMC and NVIDIA, explores how the retail industry is being transformed by machine and deep learning.