Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems

DK Panda, Ohio State University

In this video from the Stanford HPC Conference, DK Panda from Ohio State University presents: Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems.

“This talk will focus on challenges in designing HPC, Deep Learning, and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss the challenges in designing runtime environments for MPI+X (PGAS-OpenSHMEM/UPC/CAF/UPC++, OpenMP and Cuda) programming models by taking into account support for multi-core systems (KNL and OpenPower), high networks, GPGPUs (including GPUDirect RDMA) and energy awareness. Features and sample performance numbers from MVAPICH2 libraries will be presented. For the Deep Learning domain, we will focus on popular Deep Learning framewords (Caffe, CNTK, and TensorFlow) to extract performance and scalability with MVAPICH2-GDR MPI library and RDMA-enabled Big Data stacks. Finally, we will outline the challenges in moving these middleware to the Cloud environments.”

Dr. Dhabaleswar K. (DK) Panda is a Professor and Distinguished Scholar of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance networking, InfiniBand, network-based computing, exascale computing, programming models, GPUs and accelerators, high performance file systems and storage, virtualization and cloud computing and BigData (Hadoop (HDFS, MapReduce and HBase) and Memcached). He has published over 400 papers in major journals and international conferences related to these research areas.

Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, Omni-Path, iWARP and RoCE. His research group is currently collaborating with National Laboratories and leading InfiniBand, Omni-Path, iWARP and RoCE companies on designing various subsystems of next generation high-end systems. The MVAPICH (High Performance MPI and MPI+PGAS over InfiniBand, iWARP and RoCE with support for GPGPUs, Xeon Phis and Virtualization) software libraries , developed by his research group, are currently being used by more than 2,850 organizations worldwide (in 85 countries). These software packages have enabled several InfiniBand clusters to get into the latest TOP500 ranking. More than 440,000 downloads of this software have taken place from the project website alone. These software packages are also available with the software stacks for network vendors (InfiniBand, Omni-Path, RoCE, and iWARP), server vendors (OpenHPC), and Linux distributors (such as RedHat and SuSE). This software is currently powering the #1 supercomputer in the world.

See more talks in the Stanford HPC Conference Video Gallery

Check out our insideHPC Events Calendar