Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Prying an Integrated HPC Cluster from Vendors’ Hands: ‘We Had to Force Them to Get into the Same Room’

Sometimes, vendors just won’t take customers’ money for a cluster no matter how hard the customer pleads. Last month, HPC technology strategist Ryan Quick, co-founder and principal at Providentia Worldwide consulting firm, told us that while vendors have technologies capable of delivering solutions across the stack, too often they have disconnects between business units that makes buying an integrated solution difficult, if not impossible. In this interview, Quick expands on those comments with problem scenarios and advice on handling vendors.

InfiniBand Powers World’s Leading Weather Forecasters’ Supercomputers

In this feature article from our friends over at Mellanox, we discuss how weather and climate models are both compute and data intensive. Forecast quality scales with modeling complexity and resolution. Resolution depends on the performance of supercomputers. And supercomputer performance depends on the underlying interconnect technology: to get higher performance, the interconnect must be able to move data quickly, effectively and in a scalable manner across compute resources.

Never Enough Bandwidth: Optical I/O Consortium Formed to Set Interconnect Standards

More than 20 companies have joined an industry consortium to establish specifications for multi-wavelength integrated optics – the emerging interconnect technology whose advocates say is critical to next-generation HPC and AI. Announced today, the CW-WDM MSA (Continuous-Wave Wavelength Division Multiplexing Multi-Source Agreement) Group, wants to build an ecosystem to work on common standards and interoperability for dense laser light sources, which in turn will enable broad adoption of optical I/O.

NVIDIA Mellanox ConnectX-6 Lx SmartNIC Accelerates Cloud and Enterprise Workloads

Today NVIDIA launched the NVIDIA Mellanox ConnectX-6 Lx SmartNIC — a highly secure and efficient 25/50 gigabit per second (Gb/s) Ethernet smart network interface controller (SmartNIC) — to meet surging growth in enterprise and cloud scale-out workloads. “ConnectX-6 Lx, the 11th generation product in the ConnectX family, is designed to meet the needs of modern data centers, where 25Gb/s connections are becoming standard for handling demanding workflows, such as enterprise applications, AI and real-time analytics.”

Liqid, Dell, and AMD power Industry’s Fastest Single-socket Storage Server

Today Liqid announced that it has worked with industry leaders AMD and Dell Technologies to deliver one of the fastest one-socket storage rack servers on the market. “Liqid’s composable Gen-4 PCI-Express (PCIe) fabric technology, the LQD4500, is coupled with the AMD EPYC 7002 Series Processors, and enclosed in Dell Technologies’ industry-leading Dell EMC PowerEdge R7515 Rack Server to deliver an architecture designed for the most demanding next-generation, AI-driven HPC application environments.”

Lenovo to deploy 17 Petaflop supercomputer at KIT in Germany

Today Lenovo announced a contract for a 17 petaflop supercomputer at Karlsruhe Institute of Technology (KIT) in Germany. Called HoreKa, the system will come online this Fall and will be handed over to the scientific communities by summer 2021. The procurement contract is reportedly on the order of EUR 15 million. “The result is an innovative hybrid system with almost 60.000 next-generation Intel Xeon Scalable Processor cores and 220 terabytes of main memory as well as 740 NVIDIA A100 Tensor Core GPUs. A non-blocking NVIDIA Mellanox InfiniBand HDR network with 200 GBit/s per port is used for communication between the nodes. Two Spectrum Scale parallel file systems offer a total storage capacity of more than 15 petabytes.”

OFA and Gen-Z Consortium to advance industry standardization of open-source fabric management

The OFA and Gen-Z Consortium recently entered a Memorandum of Understanding (MoU) agreement to advance the industry standardization of open-source fabric management. “Potential activities outlined in the agreement include joint development of a roadmap guiding future enhancements and development of the libfabric API as well as an abstract fabric manager built on the concepts of Distributed Management Task Force’s (DMTF) Redfish standard.”

Agenda Posted for OpenFabrics Virtual Workshop

The OpenFabrics Alliance (OFA) has opened registration for its OFA Virtual Workshop, taking place June 8-12, 2020. This virtual event will provide fabric developers and users an opportunity to discuss emerging fabric technologies, collaborate on future industry requirements, and address today’s challenges. “The OpenFabrics Alliance is committed to accelerating the development of high performance fabrics. This virtual event will provide fabric developers and users an opportunity to discuss emerging fabric technologies, collaborate on future industry requirements, and address challenges.”

How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems

DK Panda from Ohio State University gave this talk at the Stanford HPC Conference. “This talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.”

Video: Ayar Labs pushes Moore’s Law through Optical I/O technology

In this video, Mark Wade from Ayar Labs explains how the company’s optical I/O solution will address the critical computing challenges of efficiency, density, and distance for next-gen system architectures. “Our patented approach uses industry standard cost-effective silicon processing techniques to develop high speed, high density, low power optical based interconnect “chiplets” and multi-wavelength lasers to replace traditional electrical based I/O.”