RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects

Print Friendly, PDF & Email

Torsten Hoefler from ETH Zurich

In this video from the Swiss HPC Conference, Torsten Hoefler from ETH Zurich presents: RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects.

This talk won the best presentation award at the three-day conference.

Modern interconnects offer remote direct memory access (RDMA) features. Yet, most applications rely on explicit message passing for communications albeit their unwanted overheads. The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly. We demonstrate how to efficiently implement the specification on modern RDMA networks. Our protocols support scaling to millions of cores with negligible memory consumption while providing highest performance and minimal overheads, comparable to, or better than UPC and CAF in terms of latency, bandwidth, and message rate. After this, we recognize that network cards contain rather powerful processors optimized for data movement and limiting the functionality to remote direct memory access seems unnecessarily constraining. We develop sPIN, a portable programming model to offload simple packet processing functions to the network card. To demonstrate the potential of the model, we design a cycle-accurate simulation environment by combining the network simulator LogGOPSim and the CPU simulator gem5. We implement offloaded message matching, datatype processing, and collective communications and demonstrate transparent full-application speedups. Furthermore, we show how sPIN can be used to accelerate redundant in-memory filesystems and several other use cases. Our work investigates a portable packet-processing network acceleration model similar to compute acceleration with CUDA or OpenCL. We show how such network acceleration enables an eco-system that can significantly speed up applications and system services.”

Torsten Hoefler directs the Scalable Parallel Computing Laboratory (SPCL) at D-INFK ETH Zurich. He received his PhD degree in 2007 at Indiana University and started his first professor appointment in 2011 at the University of Illinois at Urbana-Champaign.

Torsten has served as the lead for performance modeling and analysis in the US NSF Blue Waters project at NCSA/UIUC. Since 2013, he is professor of computer science at ETH Zurich and has held visiting positions at Argonne National Laboratories, Sandia National Laboratories, and Microsoft Research Redmond (Station Q).

Dr. Hoefler’s research aims at understanding the performance of parallel computing systems ranging from parallel computer architecture through parallel programming to parallel algorithms. He is also active in the application areas of Weather and Climate simulations as well as Machine Learning with a focus on Distributed Deep Learning. In those areas, he has coordinated tens of funded projects and an ERC Starting Grant on Data-Centric Parallel Programming.

Download the paper: sPIN: High-performance streaming Processing in the Network

See more talks from the Swiss HPC Conference

Check out our insideHPC Events Calendar