Paving the Road to Exascale with Co-Design Architecture

Print Friendly, PDF & Email

In this special guest feature from the Print’n Fly Guide to SC15 in Austin, Scot Schultz from Mellanox writes that a new era of Co-Design will pave the way to Exascale.

Scot Schultz, Mellanox

Scot Schultz, Mellanox

Over the past decade, high performance computing has scaled from teraflop performance to Petaflop performance, and is now heading toward the Exaflop era. Technology development has had to keep up in order to enable such performance leaps, with such notable advancements as the move from SMP architecture to clustered multiprocessing with multi-core processors, as well as added acceleration from GPUs, FPGAs and other co-processing technologies.

Historically, increased performance has been achieved with development of the individual hardware devices, drivers, middleware, and software applications, furthering scalability and maximizing higher throughput. However, this trend is becoming short-lived. Enabling the next order of magnitude performance improvements for Exascale-class computing will require technology collaboration in all areas. The discrete development and typical integration strategy is not feasible as a solution that will meet the requirements of Exascale, as no one company or development effort can efficiently provide all the components necessary to scale performance to such a degree; therefore, a system-level approach to Exascale computing is already underway.

A New Era of Co-Design

Co-Design is a collaborative effort among industry thought leaders, academia, and manufacturers to reach Exascale performance by taking a holistic system-level approach to fundamental performance improvements. Co-Design architecture enables all active system devices to become acceleration devices by orchestrating a more effective mapping of communication between devices in the system. This produces a well-balanced architecture across the various compute elements, networking, and data storage infrastructures that exploits system efficiency and even reduces power consumption.

Exascale computing will undoubtedly include three primary concepts: heterogeneous systems, direct communication through a more sophisticated intelligent network, and backward/forward compatibility. Co-Design includes these concepts in order to create an evolutionary architectural approach that will enable Exascale-class systems.

Seamless Heterogeneous System Architecture



An example of recent efforts, and a more unified approach to better enable heterogeneous systems, is the OpenUCX project. OpenUCX is a collaborative effort of industry, laboratories, and academia, working together to create an open production-grade communication framework for high-performance computing applications. OpenUCX is already well underway and addresses fundamental concerns of application portability across a variety of hardware, without the need to migrate applications and the system software stack for every type of infrastructure. The participants in this initiative include IBM, NVIDIA, Mellanox, the University of Houston, Oak Ridge National Laboratory, The University of Tennessee and many others. The project is also composed of many leading thought-leaders on an advisory panel to guide the efforts toward the most effective solutions for Exascale.

UCX was initially created by merging three existing HPC frameworks:

  • Oak Ridge was working on an interface called UCCS, which was their framework supporting SHMEM over their systems.
  • IBM was working on PAMI, which was their interface for the Blue Gene/Q supercomputer; and
  • Mellanox was working on MXM, its messaging accelerator for MPI or PGAS, which already used a co-design approach to parallel programming libraries.

UCX will replace all of those by supporting all of these communication frameworks on one side and all hardware interfaces on the other side. The result of this approach is an optimized communication path with low software overheads, producing near-bare-metal performance and portability of software from one interconnect to another.

An More Intelligent Interconnect

Direct communication is another important concept in achieving Exascale computing by providing a direct peer-to-peer communication path between acceleration devices. This approach significantly decreases latency and completely removes the CPU from all network communications. GPUDirect® RDMA is another example of co-design collaboration between Mellanox and NVIDIA, allowing direct peer-to-peer communication between remote GPUs over the Mellanox fabric and completely bypassing the need for CPU and host memory intervention to move data. This capability reduces latency for internode GPU communication by upwards of 70%.

The continued development of this technology will soon evolve into the next generation of GPUDirect RDMA, known as GPUDirect ASYNC, which includes additional key aspects of peer-to-peer transactions, including more control of network operations to the accelerator and offloading of the control plane from the CPU and the data path. The result will further reduce latency, allow much lower power CPUs to be coupled with GPU acceleration capabilities, and address power reduction across peer devices that will be typical in a heterogeneous-based system balanced with both vector and scalar components.

Backward and Forward Compatibility

Another important concept in reaching Exascale is compatibility. Backward compatibility must always be a consideration when advancing technologies with performance improvements, but forward compatibility will be of paramount importance toward implementing Exascale computing. Whereas it is not uncommon for 10-20 Petaflop machines to be completely replaced within a five-year period today, Exascale machines will not be able to be supplanted so easily. As such, co-design is inclusive of using open standards for portability and compatibility, ensuring that Exascale computing can be achieved without the fear that clusters will need to be entirely overhauled or upgraded.

A common concern when working with the traditional approach (in which technologies are integrated instead of Co-Designed) is with point-to-point processor technologies such as QPI or HyperTransport. Such technologies have their own defined set of physical, link, routing, transport, and protocol layers which have not remained consistent and compatible over time. This not only introduces backwards compatibility issues between SOC-technologies, but it also eliminates future-proofing to the next generation of integrated elements. Exascale systems must have guaranteed future-proofing to maintain such a level of investment, performance, and capabilities, and to keep millions of lines of application code from being overhauled for every generation of hardware.

Paving the Road to Exascale

Mellanox has already released the lowest latency end-to-end 100Gb/s interconnect solution available, enabling even more data to be transferred in less time. EDR InfiniBand capabilities already are based on the co-design approach and include numerous offloading engines and acceleration capabilities that free the CPU cores from the communications overhead, allowing the CPU to perform more meaningful application computation. This is the fundamental reason why Mellanox, along with industry partners and thought leaders, continue to drive the most powerful and the most efficient supercomputers in the world. Mellanox has already deployed ultra-low latency 100Gb/s InfiniBand and Ethernet technology, and is executing the next generation of smart interconnect capabilities; paving the road to Exascale.

In order to reach Exaflop levels of scalability and performance, only co-design can provide a holistic system perspective that addresses the next order of magnitude of performance. Continued collaboration is crucial to achieving the flexibility, efficiency, and portability necessary to make the move to Exascale computing a reality.

printnflyAustinThis article was originally published in the Print ‘n Fly Guide to SC15 in Austin. We designed this Guide to be an in-flight magazine custom tailored for your journey to SC15 — the world’s largest gathering of high performance computing professionals.

Table of Contents

Download the Print’nFly Guide to SC15 in Austin