Aug. 21-23: 11th Annual MVAPICH User Group (MUG) Conference in Columbus, OH

The 11th annual MVAPICH User Group (MUG) conference will be held Monday-Wednesday, Aug. 21-23 at the Ohio State Univeersity Translational Data Analytics Institute (TDAI), Pomerene Hall, Room #320 in Columbus, OH. Registration information can be found here. . The MUG conference provides an open forum for attendees (users, system administrators, researchers, engineers, and students) to […]

9th Annual MVAPICH User Group (MUG) to Meet Aug. 23-25

The 9th Annual MVAPICH User Group (MUG) meeting will take place August 23-25, 2021 in Columbus, OH. The organizers said MUG meeting is an open forum for users, system administrators, researchers, engineers and students to share their knowledge on using MVAPICH2 libraries (including MVAPICH2-X, MVAPICH2-GDR, MVAPICH2-X-Azure, and MVAPICH2-X-AWS), OSU Micro-Benchmarks (OMB), and OSU INAM on large-scale […]

Videos, Slides from MUG ’20 Now Available

The MVAPICH User Group Meeting (MUG ’20), built around an implementation of the MPI standard developed by Ohio State University, has posted videos and slides from presentations on a variety of topics at its recent annual conference. The program included keynote talks from Brian van Essen from Lawrence Livermore National Labs and Michael Norman from San Diego Supercomputing […]

MUG ’20, Altair, Arm DevSummit Announce Conference Agendas

Several organizations have released updates on upcoming conferences and user group meetings. Here’s a summary with links to further information. The MVAPICH User Group Meeting (MUG ’20), built around an implementation of the MPI standard developed by Ohio State University, will be an online event held on Monday, August 24th through Wednesday, August 26th. This […]

How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems

DK Panda from Ohio State University gave this talk at the Stanford HPC Conference. “This talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.”

Designing Scalable HPC, Deep Learning, Big Data, and Cloud Middleware for Exascale Systems

DK Panda from Ohio State University gave this talk at the UK HPC Conference. “This talk will focus on challenges in designing HPC, Deep Learning, Big Data and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS – OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks, and GPGPUs (including GPUDirect RDMA).”

Checkpointing the Un-checkpointable: MANA and the Split-Process Approach

Gene Cooperman from Northeastern University gave this talk at the MVAPICH User Group. “This talk presents an efficient, new software architecture: split processes. The “MANA for MPI” software demonstrates this split-process architecture. The MPI application code resides in “upper-half memory”, and the MPI/network libraries reside in “lower-half memory”.

The ABCI Supercomputer: World’s First Open AI Computing Infrastructure

Shinichiro Takizawa from AIST gave this talk at the MVAPICH User Group. “ABCI is the world’s first large-scale Open AI Computing Infrastructure, constructed and operated by AIST, Japan. It delivers 19.9 petaflops of HPL performance and world’ fastest training time of 1.17 minutes in ResNet-50 training on ImageNet datasets as of July 2019. In this talk, we focus on ABCI’s network architecture and communication libraries available on ABCI and shows their performance and recent research achievements.”

Offering Bare-Metal Performance and Scalability on Cloud: The Azure-HPC Approach

Jithin Jose from Microsoft gave this talk at the MVAPICH User Group. “This talk focuses on how HPC offerings in Azure address these challenges and explains the design pillars that allow Microsoft to offer “bare-metal performance and scalability” on the Microsoft Azure Cloud. This talk also covers the features of latest Microsoft Azure HPC offerings and provides in-depth performance insights and recommendations for using MVAPICH2 and MVAPICH2-X on Microsoft Azure.”

Video: Managing HPC Software Complexity with Spack

Greg Becker from LLNL gave this talk at the MVAPICH User Group. “Spack is an open-source package manager for HPC. This presentation will give an overview of Spack, including recent developments and a number of items on the near-term roadmap. We will focus on Spack features relevant to the MVAPICH community; these include Spack’s virtual package abstraction, which is used for API-compatible libraries including MPI implementations, package level compiler wrappers, and packages which modify other package’s build environments.”