Job of the Week: HPC Storage Infrastructure Engineer at NERSC

NERSC is seeking an HPC Storage Infrastructure Engineer for its Storage Systems Group. This group is responsible for architecting, deploying, and supporting the high-performance parallel storage systems relied upon by NERSC’s 7,000 scientific users to conduct basic scientific research across a wide range of disciplines. “The HPC Storage Infrastructure Engineer will work closely with approximately eight other storage systems and software engineers in this group to support and optimize hundreds of petabytes of parallel storage that is served to thousands of clients at terabytes per second.”

Perlmutter supercomputer to include more than 6000 NVIDIA A100 processors

NERSC is among the early adopters of the new NVIDIA A100 Tensor Core GPU processor announced by NVIDIA this week. More than 6,000 of the A100 chips will be included in NERSC’s next-generation Perlmutter system, which is based on an HPE Cray Shasta supercomputer that will be deployed at Lawrence Berkeley National Laboratory later this year. “Nearly half of the workload running at NERSC is poised to take advantage of GPU acceleration, and NERSC, HPE, and NVIDIA have been working together over the last two years to help the scientific community prepare to leverage GPUs for a broad range of research workloads.”

AMD Wins Slot in Latest NVIDIA A100 Machine Learning System

Today AMD demonstrated continued momentum in HPC with NVIDIA’s announcement that 2nd Generation AMD EPYC 7742 processors will power their new DGX A100 dedicated AI and Machine Learning system. AMD has an impressive set of HPC wins in the past year, and has been chosen by the DOE to power two pending exascale-class supercomputers, Frontier and El Capitan. “2nd Gen AMD EPYC processors are the first and only current x86-architecture server processor supporting PCIe 4.0, providing up to 128 lanes of I/O, per processor for high performance computing and connections to other devices like GPUs.”

Atos Launches First Supercomputer Equipped with NVIDIA A100 GPU

Today Atos announced its new BullSequana X2415, the first supercomputer in Europe to integrate NVIDIA’s Ampere next-generation graphics processing unit architecture, the NVIDIA A100 Tensor Core GPU. This new supercomputer blade will deliver unprecedented computing power to boost application performance for HPC and AI workloads, tackling the challenges of the exascale era. The BullSequana X2415 blade will increase computing power by more than 2X and optimize energy consumption thanks to Atos’ 100% highly efficient water-cooled patented DLC (Direct Liquid Cooling) solution, which uses warm water to cool the machine.

New NVIDIA DGX A100 Packs Record 5 Petaflops of AI Performance for Training, Inference, and Data Analytics

Today NVIDIA unveiled the NVIDIA DGX A100 AI system, delivering 5 petaflops of AI performance and consolidating the power and capabilities of an entire data center into a single flexible platform. “DGX A100 systems integrate eight of the new NVIDIA A100 Tensor Core GPUs, providing 320GB of memory for training the largest AI datasets, and the latest high-speed NVIDIA Mellanox HDR 200Gbps interconnects.”

Novel Liquid Cooling Technologies for HPC

In this special guest feature, Robert Roe from Scientific Computing World writes that increasingly power-hungry and high-density processors are driving the growth of liquid and immersion cooling technology. “We know that CPUs and GPUs are going to get denser and we have developed technologies that are available today which support a 500-watt chip the size of a V100 and we are working on the development of boiling enhancements that would allow us to go beyond that.”

Job of the Week: R&D Operations and Maintenance Lead at Lockheed Martin

Lockheed Martin is seeking an R&D Operations and Maintenance Lead in our Job of the Week. “This position is the CSCF Program’s Operations and Maintenance Lead. This position is responsible for managing a small team of geographically diverse System Administrators in a Research and Development (R&D), Multi User High Performance Computer (HPC), Multi Level Secure (MLS) Data Center on a 5×12 schedule.”

Agenda Posted for OpenFabrics Virtual Workshop

The OpenFabrics Alliance (OFA) has opened registration for its OFA Virtual Workshop, taking place June 8-12, 2020. This virtual event will provide fabric developers and users an opportunity to discuss emerging fabric technologies, collaborate on future industry requirements, and address today’s challenges. “The OpenFabrics Alliance is committed to accelerating the development of high performance fabrics. This virtual event will provide fabric developers and users an opportunity to discuss emerging fabric technologies, collaborate on future industry requirements, and address challenges.”

Katie Antypas Named Director of Hardware & Integration at Exascale Computing Project

The Exascale Computing Project has selected Berkeley Lab’s Katie Antypas as its new Director for the project’s Hardware & Integration Focus Area. “Katie has more than 14 years of experience at Berkeley Lab and is a widely recognized speaker and presenter throughout the HPC community. We are thrilled to have her take on such a critical function of leading this group and ensuring the project’s success in interfacing with the DOE HPC facilities.”

Podcast: A Shift to Modern C++ Programming Models

In this Code Together podcast, Alice Chan from Intel and Hal Finkel from Argonne National Lab discuss how the industry is uniting to address the need for programming portability and performance across diverse architectures, particularly important with the rise of data-intensive workloads like artificial intelligence and machine learning. “We discuss the important shift to modern C++ programming models, and how the cross-industry oneAPI initiative, and DPC++, bring much-needed portable performance to today’s developers.”