Rock Stars of HPC: DK Panda

This Rock Stars of HPC series is about the men and women who are changing the way the HPC community develops, deploys, and operates the supercomputers and social and economic impact of their discoveries.

DK Panda, Ohio State University

Over the past seven years here at insideHPC, I’ve spent a lot of time on the road at high performance computing events. In that time, perhaps no other speaker has been more prolific than DK Panda from Ohio State University. As our newest Rock Star of HPC, DK sat down with us to discuss his passion for teaching High Performance Computing.

insideHPC: What first sparked your passion for HPC?

DK Panda: I have been working on high-performance computing for more than 30 years now. During my M.S. study, I got exposed to the concepts of parallel computing. It was fascinating to see how you can combine the computing power of multiple processors (processing elements, as defined that time) to solve a bigger problem with a lesser time. During that time, “Dataflow Computing” was a hot topic. I selected my Master’s thesis to be along this direction. My thesis focused on designing an efficient architecture for dataflow computing. The results produced by this thesis strengthened my passion to work in this field for my career.

Since then, I have been focusing on multiple aspects related to parallel computing and HPC. My Ph.D. thesis also focused on high-performance networking and architectures for coarse-grain multiprocessing. After I joined The Ohio State University, I worked for many years with wormhole routing mechanisms and schemes to design scalable communication libraries for HPC systems. As the networking technologies for HPC systems gradually moved to Myrinet, Quadrics, InfiniBand, iWARP, RoCE and Omni-Path, I have continued to work with these technologies and kept on proposing innovative solutions to designing high-performance, scalable, and fault-tolerant communication architecture and programming model support for HPC systems.

insideHPC: The open source HPC software developed by your team is used all over the world. Can you tell us more about that?

DK Panda: You are referring to the MVAPICH project. We started working on this project since InfiniBand technology was introduced in 2000. My team was the first one to investigate new ways to extract the benefits of RDMA mechanisms and features in InfiniBand for the MPI (Message Passing Interface) programming model. While continuing with the publications, we found out a need in the community to start using these solutions for production HPC clusters. My team went ahead, incorporated our solutions into an open-source MPI library (MVAPICH), and made it available to the community. The first version was released during Supercomputing 2002. Our library was used in System-X from Virginia Tech to become 3^rd ranked system (one of the first InfiniBand system) in the TOP500 list during 2003.

Since then, as InfiniBand technology has progressed with multiple new features and mechanisms, and new technologies/initiatives have been proposed for iWARP, RoCE and Omni-Path, my team has been continuously working on proposing new solutions for all these technologies and has been incorporating these solutions into the MVAPICH library. We have also incorporated optimized solutions related to the latest MPI and PGAS standards, accelerators (NVIDIA GPGPUs), multi-core/many-core processors, power-aware designs within the MPI library, integrated network management and tools, and HPC cloud technologies. The MVAPICH software family now covers all different kinds of HPC clusters being deployed and used in the field. In addition, my team also has created MPI- and PGAS-level micro-benchmarks and tools to analyze the performance of MPI and PGAS communication primitives.

The MVAPICH software libraries, micro-benchmarks, and tools are extensively used in the HPC community. These libraries have been powering several supercomputers in the TOP500 list during the last decade. Examples (from the Nov’16 ranking) include: 1^st ranked 10,649,600-core (Sunway TaihuLight) at National Supercomputing Center in Wuxi, China; 13^th ranked 241,108-core (Pleiades) at NASA; 17^th ranked 462,462-core (Stampede) at TACC; and 40^th ranked 74,520-core (Tsubame 2.5) at Tokyo Institute of Technology. As of April ’17, more than 2,775 organization in 84 countries (based on voluntary registration) are using these libraries. More than 415,000 downloads of these libraries have taken place from the project’s Website. These libraries are also being available in the software stacks of many different hardware vendors, software vendors, and Linux distros. These libraries are enabling hundreds of thousands of MPI and PGAS users worldwide on a daily basis to make giant leaps and breakthroughs in their disciplines.

insideHPC: You travel extensively every year to speak at conferences and meet with HPC users. Why is this important to your mission as a teacher?

DK Panda: Yes, for the last 16 years, I have been traveling extensively worldwide to deliver tutorials, keynote talks, invited talks and participate in panels at many events. Through these events, I have met a large number of HPC users working at all different layers of hardware, software, and applications of HPC systems. Many of these users also work in many different vertical domains. It has been an amazing experience. As an educator and teacher, I firmly believe in sharing and empowering people with knowledge. Through these events, I have been able to deliver cutting-edge information and knowledge related to HPC, Exascale computing, programming models, high-performance networking, Big Data, and Deep Learning to thousands of people worldwide. I regularly meet people in these events who have been using MVAPICH and other software libraries, designed and developed in my group. I have heard many successful stories related to how these people and their organizations are using our libraries to advance their research and development work. It has always been a very satisfying experience to know that my team and I have been able to help these people and their communities to make advances in their respective disciplines. Many times, I also get critical feedback and suggestions from attendees in these events regarding how we can enhance and strengthen our libraries. I bring these feedback to my team. We work on incorporating their feedback and suggestions to our next phase of research, development and software releases. Thus, such engagements have been a continuous learning experience for me and my team.

insideHPC: What are the biggest software challenges for accelerated computing?

DK Panda: During the last several years, HPC systems have been going through rapid changes to incorporate accelerators. The main software challenges for such systems have been to provide efficient support for programming models with high performance and high productivity. For NVIDIA-GPU based systems, seven years back, my team introduced a novel `CUDA-aware MPI’ concept. This paradigm allows complete freedom to application developers for not using CUDA calls to perform data movement. The MPI library incorporates the necessary calls to move data from/to GPU devices and does it with the highest performance. The MVAPICH2-GDR library from my group incorporates such designs and delivers both high performance and high productivity. This new concept has been adopted by many other MPI stacks. This concept has also been extended to PGAS models, such as OpenSHMEM, by my team. This new concept is allowing a large number of GPU users an ease in programming while harnessing the best performance from their accelerator-based systems. As next-generation systems with GPUs are becoming complex in their configurations, new concepts and paradigms need to be researched and explored to provide efficient support for programming models for these systems.

insideHPC: Machine Learning is a very hot topic these days. What is your team working on in this area?

DK Panda: Yes, Machine Learning/Deep Learning are becoming very hot topics. This field is evolving along two directions: 1) Exploiting MPI libraries and 2) Exploiting Big Data stacks like Spark. My team has been working on both these directions. Currently, many Deep Learning frameworks (such as CNTK and Caffe) are using collective operations like broadcast, reduce and all-reduce with large message sizes. We have optimized such collective operations in our MVAPICH2-GDR library to deliver best performance and scalability for these Deep Learning frameworks. We have also worked on co-designing the Caffe stack to deliver both scale-up and scale-out. This enhanced version of the Caffe is available under the High-Performance Deep Learning project from my group. Under the second direction, for the last several years, my team has been working on bringing high-performance and scalability to the commonly used Big Data stacks (Hadoop, Spark, and Memcached). We have proposed and designed novel schemes such that these Big Data stacks can run on current generation HPC clusters with InfiniBand and RoCE networking technologies and parallel file systems like Lustre. These enhanced libraries (RDMA-Spark, RDMA-Hadoop, and RDMA-Memcached) are available under the High-Performance Big Data project from my team. Many Deep Learning libraries are able to take advantage of the HiBD libraries to extract higher performance and scalability on HPC clusters.

insideHPC: How important is Open Source to your efforts?

DK Panda: Open-source is quite important to the efforts in my group. As indicated above, the open-source MVAPICH project started in the year 2000 and has been going on in a strong manner for the last 17 years. Due to the open-source nature, many designs incorporated in the MVAPICH project have been adopted by other MPI libraries and middleware for HPC systems. The MVAPICH project has been a vital and strong component in the InfiniBand ecosystem during the last 17 years. In addition to the direct benefits to the HPC systems and users, the open-source MVAPICH codebase has also been used extensively by many students and professionals to learn about how to program and use InfiniBand and other RDMA-based networking technologies. Such learning has helped to train the next generation HPC professionals. We hope to continue with such efforts in the coming years.

DK Panda will be a Featured Speaker at GTC 2017 with three talks:

See DK Panda speak at the GPU Technology Conference in Santa Clara, CA on May 8-11 – Learn More.

Sponsored Guest Articles

‘Glow-in-the-Dark’ GPUs, Holes Burnt in Boards, Overprovisioning Systems ‘Until Funding Runs Out’ and Other Factors Calling for Optical I/O

White Papers

Energy efficiency drives HPC to the cloud

Comments

Featured RSS Feed

More News from insideBIGDATA