High Performance Computing (HPC) has always been about pushing the limits of technology. The evolution of HPC systems can be broken into three distinct epochs defined by the economics and technology of the day. Initially, specialized supercomputers were built that excelled at mathematical operations (FLOPS). As fabrication costs escalated, clusters of low cost servers connected with high speed interconnects (e.g. InfiniBand) replaced these monolithic multiprocessor systems. The next epoch is defined by the multi/many-core revolution where physical limitations required chip designs to expand the number of processing cores instead of increasing processor frequency. The final and most recent epoch, is that of co-design where HPC systems are purpose built using input from application users and software developers.
This the first article in a series from the insideHPC Guide to Co-Design Architectures.
Network co-design includes offloading network responsibilities from the host processor to the network card. Non-offload designs, that use the processor for all the heavy lifting (onloading), have relied on the increase in processor speed and the growth of multicore systems to keep data moving at top speeds. However, research has shown that the reliance on clock speed can severely degrade network performance when slower processors are used. In particular, the power budget required to put more cores in a single processor has actually forced a lowering of clock speed. Benchmarks for real applications have shown an advantage for offload (vs. onload) HPC networks.
Mellanox, an InfiniBand vendor, has extended the local offload model to include MPI collectives, SHMEM/PGAS, and UPC based applications. The introduction of the Scalable Hierarchical Aggregation Protocol (SHArP) by Mellanox further pushes the offload into the actual network allowing optimizations that are not passible at the local level. Benchmarks indicate performance actually improves as HPC clusters scale in size (normally the opposite true). Other emerging protocols, that allow deeper network co-design, include United Communication X Framework (UCX) and Cache Coherent Interconnect for Accelerators (CCIX).
The recent supercomputing refresh in the United States was announced as part of the Collaboration of Oak Ridge, Argonne, and Lawrence Livermore Labs (CORAL) procurement. Co-design is an integral part of these systems and is expected to contribute to their enhanced performance when they become operational in the 2017/2018 time frame.
Finally, co-design is not limited to HPC systems. Big data analytics use general-purpose tools (e.g Hadoop, Spark) and when systems are optimized for a specific problem, large performance gains are realized.
The use of co-design and offloading are important tools in achieving Exascale computing. Application developers and system designers can take advantage of network offload and emerging co-design protocols to accelerate their current applications. Adopting some basic co-design and offloading methods to smaller scale systems can achieve more performance on less hardware resulting in low cost and higher throughput.
Over the next several week we will explore each of these topic in detail.
- The Evolution of HPC
- The First Step in Network Co-design: Offloading
- Network Co-design as a Gateway to Exascale
- Co-design for Data Analytics And Machine Learning