Overcoming Bandwidth and Capacity Bottlenecks in the Exascale Era

Print Friendly, PDF & Email

Sponsored Post

Bandwidth bottlenecks that limit inter-processor and memory performance are a major barrier to unlocking future system performance. Today’s high-performance computing (HPC) systems must process massive amounts of data, and the growth of data will only increase in the exascale computing era. While there have been major advances in processors and accelerators (XPUs) used to speed and optimize processing data from scientific and AI application workloads, more technology solutions are needed.

In HPC system architectures, compute and memory resources are tightly coupled.  However, there is a growing disparity of speed between the XPU and memory outside the XPU package, often caused by the limited communication bandwidth of interconnect technologies. Current memory solutions, such as HBM and DDR5, are constrained because of thermal and signal integrity issues. Lawrence Livermore National Laboratory (LLNL) recently performed a study on four large HPC clusters to test memory utilization and concluded that, “Our results show that more than 90% of jobs utilize less than 15% of the node memory capacity, and for 90% of the time, memory utilization is less than 35%.”

Disaggregation to Break the Memory Bottleneck

Future HPC and AI system architectures will look to disaggregate or pool resources as the answer to some of these memory challenges. One approach is to disaggregate XPUs and memory into separate physical entities. This approach creates a single memory pool allowing XPUs to access bigger pools of resources such as shared DRAM. For example, the system could connect cores to memory as requests arrive. Applications could use all memory available across an entire data center instead of being confined to the memory of a single server.

Optical I/O to Solve the Bandwidth Bottlenecks

Disaggregating and pooling system components raises technical questions such as how to manage workflows around such systems and the fundamental technical problems of moving data between these components. Enabling these new flexible system architectures will require high-bandwidth, low-latency interconnects.

A transition to photonics (or optical I/O) enables memory to be pooled with low latency and high performance. “Optical I/O is expected to be the foundation of new interconnects that will allow heterogenous connectivity, with tremendous bandwidth, low latency and low power, across a range of new system designs,” states Vladimir Stojanovic, Chief Architect & Co-Founder, Ayar Labs. A variety of protocols are considering using optical I/O to enable system scalability. One example is Compute Express Link (CXL), an emerging unified protocol for disaggregated systems that uses PCIe electrical signaling for I/O interconnect.

Join us for the Advanced Memory Architectures to Overcome Bandwidth Bottlenecks for the Exascale Era of Computing webinar on November 10 at 9:00 am PT. During this webinar, leading industry experts will discuss the future of advanced memory architectures, new optical I/O solutions using silicon photonics, and the technologies and environments needed to make next-generation performance a reality.

Addison Snell, industry analyst of Intersect360, will lead the discussion with experts from industry and US supercomputing national laboratories, in this webinar panel:

  • Mohamad El-Batal, CTO Cloud Systems, Seagate
  • William Magro, Chief Technologist, High-Performance Computing, Google
  • Ivy Peng, Computer Scientist, Lawrence Livermore National Laboratory
  • Vladimir Stojanovic, Chief Architect & Co-Founder, Ayar Labs
  • Marten Terpstra, Sr Director, PLM & Business Development, High Performance Networking and Silicon Photonics, HPE

For more information on next-generation solutions to overcome memory bottlenecks, register for this upcoming webinar.