Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Designing Scalable HPC, Deep Learning, Big Data, and Cloud Middleware for Exascale Systems

DK Panda from Ohio State University gave this talk at the UK HPC Conference. “This talk will focus on challenges in designing HPC, Deep Learning, Big Data and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS – OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks, and GPGPUs (including GPUDirect RDMA).”

Mellanox Powers Virtualized Machine Learning with VMware and NVIDIA

Today Mellanox announced that its RDMA (Remote Direct Memory Access) networking solutions for VMware vSphere enable virtualized Machine Learning solutions that achieve higher GPU utilization and efficiency. “As Moore’s Law has slowed, traditional CPU and networking technologies are no longer sufficient to support the emerging machine learning workloads,” said Kevin Deierling, vice president marketing, Mellanox Technologies. “Using hardware compute accelerators such as NVIDIA T4 GPUs and Mellanox’s RDMA networking solutions has proven to boost application performance in virtualized deployments.”

RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects

Torsten Hoefler from ETH Zurich gave this talk at the Swiss HPC Conference. “Network cards contain rather powerful processors optimized for data movement and limiting the functionality to remote direct memory access seems unnecessarily constraining. We develop sPIN, a portable programming model to offload simple packet processing functions to the network card.”

Agenda Posted: Exacomm 2019 Workshop at ISC High Performance

“The goal of this workshop is to bring together researchers and software/hardware designers from academia, industry and national laboratories who are involved in creating network-based computing solutions for extreme scale architectures. The objectives of this workshop will be to share the experiences of the members of this community and to learn the opportunities and challenges in the design trends for exascale communication architectures.”

HPC Breaks Through to the Cloud: Why It Matters

In this special guest feature, Scot Schultz from Mellanox writes researchers are benefitting in a big way from HPC in the Cloud. “HPC has many different advantages depending on the specific use case, but one aspect that these implementations have in common is their use of RDMA-based fabrics to improve compute performance and reduce latency.”

Faster Fabrics Running Against Limits of the Operating System, the Processor, and the I/O Bus

Christopher Lameter from Jump Trading gave this talk at the OpenFabrics Workshop in Austin. “In 2017 we got 100G fabrics, in 2018 200G fabrics and in 2019 it looks like 400G technology may be seeing a considerable amount of adoption. These bandwidth compete with and sometimes are higher than the internal bus speeds of the servers that are connected using these fabrics. I think we need to consider these developments and work on improving fabrics and the associated APIs so that ways to access these features become possible using vendor neutral APIs. It needs to be possible to code in a portable way and not to a vendor specific one.”

Accelerating TensorFlow with RDMA for High-Performance Deep Learning

Xiaoyi Lu from Ohio State University gave this talk at the 2019 OpenFabrics Workshop in Austin. “Google’s TensorFlow is one of the most popular Deep Learning (DL) frameworks. We propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMAgRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance.”

Mellanox HDR 200G InfiniBand Speeds Machine Learning with NVIDIA

Today Mellanox announced that its HDR 200G InfiniBand with the “Scalable Hierarchical Aggregation and Reduction Protocol” (SHARP) technology has set new performance records, doubling deep learning operations performance. The combination of Mellanox In-Network Computing SHARP with NVIDIA 100 Tensor Core GPU technology and Collective Communications Library (NCCL) deliver leading efficiency and scalability to deep learning and artificial intelligence applications.

Video: Why InfiniBand is the Way Forward for Ai and Exascale

In this video, Gilad Shainer from the InfiniBand Trade Association describes how InfiniBand offers the optimal interconnect technology for Ai, HPC, and Exascale. “Tthrough Ai, you need the biggest pipes in order to move those giant amount of data in order to create those Ai software algorithms. That’s one thing. Latency is important because you need to drive things faster. RDMA is one of the key technology that enables to increase the efficiency of moving data, reducing CPU overhead. And by the way, now, there’s all of the Ai frameworks that exist out there, supports RDMA as a default element within the framework itself.”

How to Design Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems

DK Panda from Ohio State University gave this talk at the Stanford HPC Conference. “This talk will focus on challenges in designing HPC, Deep Learning, and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS – OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models taking into account support for multi-core systems (Xeon, OpenPower, and ARM), high-performance networks, GPGPUs (including GPUDirect RDMA), and energy-awareness.”