In this video from the 2016 Stanford HPC Conference, Pavel Shamis from ARM presents: UCX: An Open Source Framework for HPC Network APIs and Beyond.
“Unified Communication X (UCX) is a set of network APIs and their implementations for high performance computing. UCX comes from the combined efforts of national laboratories, industry, and academia to co-design and implement a high-performing and highly scalable communication APIs for next generation applications and systems. UCX solves the problem of moving data memory location “A” to memory location “B” considering across multiple type of memories (DRAM, accelerator memories, etc.) and multiple transports (e.g. InfiniBand, uGNI, Shared Memory, CUDA, etc. ), while minimizing latency, and maximizing bandwidth and message rate. We envision that through our co-design efforts, UCX will satisfy the networking needs of current and future programming models including MPI, OpenSHMEM, PGAS languages, task-based paradigms, application-centric libraries/runtimes, and I/O bound applications on variety HPC architectures. In this talk, we present the co-design principles of UCX, introduce of its APIs and show how they can be used to implement current and future programming models. We also present some early performance results on how MPI and OpenSHMEM benefits from UCX.”
Pavel Shamis is a Principal Research Engineer at ARM. His research interests include high-performance communication networks, communication middleware, and programming models. Prior to joining ARM, he spent five years at Oak Ridge National Laboratory (ORNL) as a research scientist at Computer Science and Math Division (CSMD). In this role, Pavel was responsible for research and development multiple projects in high-performance communication domain including: Collective Communication Offload (CORE-Direct & Cheetah), OpenSHMEM, and OpenUCX. Before joining ORNL, Pavel spent ten years at Mellanox Technologies, where he led Mellanox HPC team and was responsible for development HPC software stack. Pavel is a recipient of R&D100 award for development of CORE-Direct collective offload technology.