Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Artificial Intelligence: A Journey to Deep Space

Recent advancements in Artificial Intelligence, especially deep learning, are set to make an impact in the field of astronomy and astrophysics. In fact, the benefits of using AI to control space-exploring robots are already being realized by missions that are currently underway.

10 Things Not to Miss at ISC 2017 in Frankfurt

In this special guest feature, Kim McMahon checks in from Frankfurt to give us a preview of ISC 2017. There is much in store this week, so be sure not to miss a beat!

How InfiniBand is Powering new capabilities for Machine Learning with RDMA

In this video from GTC 2017, Scot Schultz from Mellanox describes how high performance InfiniBand is powering new capabilities for Machine Learning with RDMA. “Mellanox Solutions accelerate many of the world’s leading artificial intelligence and machine learning platforms. Mellanox solutions enable companies and organizations such as Baidu, Facebook, JD.com, NVIDIA, PayPal, Tencent, Yahoo and many more to leverage machine learning platforms to enhance their competitive advantage.”

RoCE Initiative Launches Online Product Directory

Today the RoCE Initiative at the InfiniBand Trade Association announced the availability of the RoCE Product Directory. The new online resource is intended to inform CIOs and enterprise data center architects about their options for deploying RDMA over Converged Ethernet (RoCE) technology within their Ethernet infrastructure.

Rock Stars of HPC: DK Panda

As our newest Rock Star of HPC, DK Panda sat down with us to discuss his passion for teaching High Performance Computing. “During the last several years, HPC systems have been going through rapid changes to incorporate accelerators. The main software challenges for such systems have been to provide efficient support for programming models with high performance and high productivity. For NVIDIA-GPU based systems, seven years back, my team introduced a novel `CUDA-aware MPI’ concept. This paradigm allows complete freedom to application developers for not using CUDA calls to perform data movement.”

Managing Node Configuration with 1000s of Nodes

Ira Weiny from Intel presented this talk at the OpenFabrics Workshop. “Individual node configuration when managing 1000s or 10s of thousands of nodes in a cluster can be a daunting challenge. Two key daemons are now part of the rdma-core package which aid the management of individual nodes in a large fabric: IBACM and rdma-ndd.”

Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV IB Clusters

Xiaoyi Lu from Ohio State University presented this talk at the Open Fabrics Workshop. “Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high performance interconnects such as InfiniBand. SR-IOV can deliver near native performance but lacks locality-aware communication support. This talk presents an efficient approach to building HPC clouds based on MVAPICH2 and RDMA-Hadoop with SR-IOV.”

Experiences with NVMe over Fabrics

“Using RDMA, NVMe over Fabrics (NVMe-oF) provides the high BW and low-latency characteristics of NVMe to remote devices. Moreover, these performance traits are delivered with negligible CPU overhead as the bulk of the data transfer is conducted by RDMA. In this session, we present an overview of NVMe-oF and its implementation in Linux. We point out the main design choices and evaluate NVMe-oF performance for both Infiniband and RoCE fabrics.”

Video: RDMA on ARM

Pavel Shamis from ARM Research presented this talk at the OpenFabrics Workshop. “With the emerging availability server platforms based on ARM CPU architecture, it is important to understand ARM integrates with RDMA hardware and software eco-system. In this talk, we will overview ARM architecture and system software stack. We will discuss how ARM CPU interacts with network devices and accelerators. In addition, we will share our experience in enabling RDMA software stack (OFED/MOFED Verbs) and one-sided communication libraries (Open UCX, OpenSHMEM/SHMEM) on ARM and share preliminary evaluation results.”

Designing HPC & Deep Learning Middleware for Exascale Systems

DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference. “This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented.”