InfiniBand Powers World’s Leading Weather Forecasters’ Supercomputers

In this feature article from our friends over at Mellanox, we discuss how weather and climate models are both compute and data intensive. Forecast quality scales with modeling complexity and resolution. Resolution depends on the performance of supercomputers. And supercomputer performance depends on the underlying interconnect technology: to get higher performance, the interconnect must be able to move data quickly, effectively and in a scalable manner across compute resources.

Arm Throwing Elbows: LRZ to Deploy Arm-based HPE Cray CS500

It’s been a good week for Arm: the Fugaku supercomputer at Japan’s Riken research center was named no. 1 on the TOP500 listing of the world’s most powerful HPC systems, and today, the Leibniz Supercomputing Centre (LRZ) in Munich announced it will deploy HPE’s Cray CS500 with Fujitsu A64FX chips based on the Arm architecture – the same processor used in Fugaku (and then there’s Apple switching from x86 for new Arm chips).

Purdue’s ‘Anvil’ to Be Driven by Dell, AMD ‘Milan’ CPUs, Nvidia A100 Tensor Core GPUs

Another in a series of National Science Foundation supercomputing awards has been announced, this one a $10 million funding for a system to be housed at Purdue University to support HPC and AI workloads and scheduled to enter production next year. The system, dubbed Anvil, will be built in partnership with Dell and AMD and […]

NVIDIA Mellanox ConnectX-6 Lx SmartNIC Accelerates Cloud and Enterprise Workloads

Today NVIDIA launched the NVIDIA Mellanox ConnectX-6 Lx SmartNIC — a highly secure and efficient 25/50 gigabit per second (Gb/s) Ethernet smart network interface controller (SmartNIC) — to meet surging growth in enterprise and cloud scale-out workloads. “ConnectX-6 Lx, the 11th generation product in the ConnectX family, is designed to meet the needs of modern data centers, where 25Gb/s connections are becoming standard for handling demanding workflows, such as enterprise applications, AI and real-time analytics.”

Lenovo to deploy 17 Petaflop supercomputer at KIT in Germany

Today Lenovo announced a contract for a 17 petaflop supercomputer at Karlsruhe Institute of Technology (KIT) in Germany. Called HoreKa, the system will come online this Fall and will be handed over to the scientific communities by summer 2021. The procurement contract is reportedly on the order of EUR 15 million. “The result is an innovative hybrid system with almost 60.000 next-generation Intel Xeon Scalable Processor cores and 220 terabytes of main memory as well as 740 NVIDIA A100 Tensor Core GPUs. A non-blocking NVIDIA Mellanox InfiniBand HDR network with 200 GBit/s per port is used for communication between the nodes. Two Spectrum Scale parallel file systems offer a total storage capacity of more than 15 petabytes.”

Agenda Posted for OpenFabrics Virtual Workshop

The OpenFabrics Alliance (OFA) has opened registration for its OFA Virtual Workshop, taking place June 8-12, 2020. This virtual event will provide fabric developers and users an opportunity to discuss emerging fabric technologies, collaborate on future industry requirements, and address today’s challenges. “The OpenFabrics Alliance is committed to accelerating the development of high performance fabrics. This virtual event will provide fabric developers and users an opportunity to discuss emerging fabric technologies, collaborate on future industry requirements, and address challenges.”

How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems

DK Panda from Ohio State University gave this talk at the Stanford HPC Conference. “This talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.”

NVIDIA Completes Acquisition of Mellanox

NVIDIA today announced the completion of its acquisition of Mellanox for a transaction value of $7 billion. “With Mellanox, the new NVIDIA has end-to-end technologies from AI computing to networking, full-stack offerings from processors to software, and significant scale to advance next-generation data centers. Our combined expertise, supported by a rich ecosystem of partners, will meet the challenge of surging global demand for consumer internet services, and the application of AI and accelerated data science from cloud to edge to robotics.”

NVIDIA Receives Approval to Proceed with Mellanox Acquisition

Today NVIDIA announced that it has received approval from all necessary authorities to proceed with its planned acquisition of Mellanox, as announced in March 2019. “This exciting transaction would unite two HPC industry leaders and strengthen the combined company’s ability to create data-centric system architectures for the convergence of the HPC and hyperscale markets around AI and other HPDA tasks,” said Steve Conway from Hyperion Research.

SDSC Expanse Supercomputer from Dell Technologies to serve 50,000 Users

In this special guest feature, Janet Morss at Dell Technologies writes that the company will soon deploy a new flagship supercomputer at SDSC. “Expanse will deliver the power of 728 dual-socket Dell EMC PowerEdge C6525 servers with 2nd Gen AMD EPYC processors connected with Mellanox HDR InfiniBand. The system will have 93,000 compute cores and is projected to have a peak speed of 5 petaflops. That will almost double the performance of SDSC’s current Comet supercomputer, also from Dell Technologies.”