High-Performance and Scalable Designs of Programming Models for Exascale Systems

“This talk will focus on challenges in designing programming models and runtime environments for Exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI+X (PGAS – OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models by taking into account support for multi-core systems (KNL and OpenPower), high-performance networks, GPGPUs (including GPUDirect RDMA), and energy-awareness.”

Designing HPC & Deep Learning Middleware for Exascale Systems

DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference. “This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented.”

UCX: An Open Source Framework for HPC Network APIs and Beyond

“Unified Communication X (UCX) is a set of network APIs and their implementations for high performance computing. UCX comes from the combined efforts of national laboratories, industry, and academia to co-design and implement a high-performing and highly scalable communication APIs for next generation applications and systems. UCX solves the problem of moving data memory location “A” to memory location “B” considering across multiple type of memories (DRAM, accelerator memories, etc.) and multiple transports (e.g. InfiniBand, uGNI, Shared Memory, CUDA, etc. ), while minimizing latency, and maximizing bandwidth and message rate.”

UPC and OpenSHMEM PGAS Models on GPU Clusters

“Learn about extensions that enable efficient use of Partitioned Global Address Space (PGAS) Models like OpenSHMEM and UPC on supercomputing clusters with NVIDIA GPUs. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop applications with dynamic and irregular communication patterns. However, the existing UPC and OpenSHMEM standards do not allow communication calls to be made directly on GPU device memory. This talk discusses simple extensions to the OpenSHMEM and UPC models to address this issue.”

Video: Preparing OpenSHMEM for Exascale

“Summit is Oak Ridge National Laboratory’s next high performance supercomputer system that will be based on a many core/GPU hybrid architecture. In order to prepare OpenSHMEM for future systems, it is important to enhance its programming model to enable efficient utilization of the new hardware capabilities (e.g. massive multithreaded systems, accesses different type memories, next generation of interconnects, etc). This session will present recent advances in the area of OpenSHMEM extensions, implementations, and tools.”