Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

10 Things Not to Miss at ISC 2017 in Frankfurt

In this special guest feature, Kim McMahon checks in from Frankfurt to give us a preview of ISC 2017. There is much in store this week, so be sure not to miss a beat!

How InfiniBand is Powering new capabilities for Machine Learning with RDMA

In this video from GTC 2017, Scot Schultz from Mellanox describes how high performance InfiniBand is powering new capabilities for Machine Learning with RDMA. “Mellanox Solutions accelerate many of the world’s leading artificial intelligence and machine learning platforms. Mellanox solutions enable companies and organizations such as Baidu, Facebook,, NVIDIA, PayPal, Tencent, Yahoo and many more to leverage machine learning platforms to enhance their competitive advantage.”

RoCE Initiative Launches Online Product Directory

Today the RoCE Initiative at the InfiniBand Trade Association announced the availability of the RoCE Product Directory. The new online resource is intended to inform CIOs and enterprise data center architects about their options for deploying RDMA over Converged Ethernet (RoCE) technology within their Ethernet infrastructure.

Rock Stars of HPC: DK Panda

As our newest Rock Star of HPC, DK Panda sat down with us to discuss his passion for teaching High Performance Computing. “During the last several years, HPC systems have been going through rapid changes to incorporate accelerators. The main software challenges for such systems have been to provide efficient support for programming models with high performance and high productivity. For NVIDIA-GPU based systems, seven years back, my team introduced a novel `CUDA-aware MPI’ concept. This paradigm allows complete freedom to application developers for not using CUDA calls to perform data movement.”

Managing Node Configuration with 1000s of Nodes

Ira Weiny from Intel presented this talk at the OpenFabrics Workshop. “Individual node configuration when managing 1000s or 10s of thousands of nodes in a cluster can be a daunting challenge. Two key daemons are now part of the rdma-core package which aid the management of individual nodes in a large fabric: IBACM and rdma-ndd.”

Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV IB Clusters

Xiaoyi Lu from Ohio State University presented this talk at the Open Fabrics Workshop. “Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high performance interconnects such as InfiniBand. SR-IOV can deliver near native performance but lacks locality-aware communication support. This talk presents an efficient approach to building HPC clouds based on MVAPICH2 and RDMA-Hadoop with SR-IOV.”

Experiences with NVMe over Fabrics

“Using RDMA, NVMe over Fabrics (NVMe-oF) provides the high BW and low-latency characteristics of NVMe to remote devices. Moreover, these performance traits are delivered with negligible CPU overhead as the bulk of the data transfer is conducted by RDMA. In this session, we present an overview of NVMe-oF and its implementation in Linux. We point out the main design choices and evaluate NVMe-oF performance for both Infiniband and RoCE fabrics.”

Video: RDMA on ARM

Pavel Shamis from ARM Research presented this talk at the OpenFabrics Workshop. “With the emerging availability server platforms based on ARM CPU architecture, it is important to understand ARM integrates with RDMA hardware and software eco-system. In this talk, we will overview ARM architecture and system software stack. We will discuss how ARM CPU interacts with network devices and accelerators. In addition, we will share our experience in enabling RDMA software stack (OFED/MOFED Verbs) and one-sided communication libraries (Open UCX, OpenSHMEM/SHMEM) on ARM and share preliminary evaluation results.”

Designing HPC & Deep Learning Middleware for Exascale Systems

DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference. “This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented.”

GIGABYTE Selects Cavium QLogic FastLinQ Ethernet Solutions

“GIGABYTE servers – across standard, Open Compute Platform (OCP) and rack scale form factors – deliver exceptional value, performance and scalability for multi-tenant cloud and virtualized enterprise datacenters,” said Etay Lee, GM of GIGABYTE Technology’s Server Division. “The addition of QLogic 10GbE and 25GbE FastLinQ Ethernet NICs in OCP and Standard form factors will enable delivery on all of the tenets of open standards, while enabling key virtualization technologies like SR-IOV and full offloads for overlay networks using VxLAN, NVGRE and GENEVE.”