Checkpointing the Un-checkpointable: MANA and the Split-Process Approach

Gene Cooperman from Northeastern University

In this video from the MVAPICH User Group, Gene Cooperman from Northeastern University presents: Checkpointing the Un-checkpointable: MANA and the Split-Process Approach.

Checkpointing is the ability to save the state of a running process to stable storage, and later restarting that process from the point at which it was checkpointed. Transparent checkpointing (also known as system-level checkpointing) refers to the ability to checkpoint a (possibly MPI-parallel or distributed) application, without modifying the binaries of that target application. Traditional wisdom has assumed that the transparent checkpointing approach has some natural restrictions. Examples of long-held restrictions are: (i) the need for a separate network-aware checkpoint-restart module for each network that will be targeted (e.g., one for TCP, one for InfiniBand, one for Intel Omni-Path, etc.); (ii) the impossibility of transparently checkpointing a CUDA-based GPU application that uses NVIDIA UVM (UVM is “unified virtual memory”, which allows the host CPU and the GPU device to each access the same virtual address space at the same time.); and (iii) the impossibility of transparently checkpointing an MPI application that was compiled for one MPI library implementation (e.g., for MPICH or for Open MPI), and then restarting under an MPI implementation with targeted optimizations (e.g., MVAPICH2-X or MVAPICH2-EA).

This talk breaks free from the restrictions described above, and presents an efficient, new software architecture: split processes. The “MANA for MPI” software demonstrates this split-process architecture. The MPI application code resides in “upper-half memory”, and the MPI/network libraries reside in “lower-half memory”. The tight coupling of upper and lower half ensures low runtime overhead. And yet, when restarting from a checkpoint, “MANA for MPI” allows one to choose to replace the original lower half with a different MPI library implementation. This different MPI implementation may offer such specialized features as enhanced intra- and inter-node point-to-point performance and enhanced performance of collective communication (e.g., with MVAPICH2-X); or perhaps better energy awareness (e.g., with MVAPICH2-EA). Further, the new lower half MPI may be optimized to run on different hardware, including a different network interconnect, a different number of CPU cores, a different configuration of ranks-per-node, etc. This makes cross-cluster migration both efficient and practical. This talk represents joint work with Rohan Garg and Gregory Price.

Professor Gene Cooperman works in high-performance computing and scalable applications for computational algebra. He received his B.S. from the University of Michigan in 1974, and his Ph.D. from Brown University in 1978. He then spent six years in basic research at GTE Laboratories. He came to Northeastern University in 1986, and has been a full professor there since 1992. His visiting research positions include a 5-year IDEX Chair of Attractivity at the University of Toulouse/CNRS in France, and sabbaticals at Concordia University, at CERN, and at Inria. He is one of the more than 100 co-authors on the foundational Geant4 paper, whose current citation count is at 25,000. The extension of the million-line code of Geant4 to use multi-threading (Geant4-MT) was accomplished in 2014 on the basis of joint work with his PhD student, Xin Dong. Prof. Cooperman currently leads the DMTCP project (Distributed Multi-Threaded CheckPointing) for transparent checkpointing. The project began in 2004, and has benefited from a series of PhD theses. Over 100 refereed publications cite DMTCP as having contributed to their research project. Prof. Cooperman’s current interests center on the frontiers of extending transparent checkpointing to new architectures. His work has been applied to VLSI circuit simulators, circuit verification (e.g., by Intel, Mentor Graphics, and others), formalization of mathematics, bioinformatics, network simulators, high energy physics, cyber-security, big data, middleware, mobile computing, cloud computing, virtualization of GPUs, and of course high performance computing.

See more talks from the MVAPICH User Group

Check out our insideHPC Events Calendar