Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead

Raghu Raja from Amazon gave this talk at the OpenFabrics Workshop in Austin. “Elastic Fabric Adapter (EFA) is the recently announced HPC networking offering from Amazon for EC2 instances. It allows applications such as MPI to communicate using the Scalable Reliable Datagram (SRD) protocol that provides connectionless and unordered messaging services directly in userspace, bypassing both the operating system kernel and the Virtual Machine hypervisor. This talk presents the designs, capabilities, and an early performance characterization of the userspace and kernel components of the EFA software stack.”

Characteristics of Remote Persistent Memory – Performance, Capacity, or Locality?

Paul Grun from Cray gave this talk at the OpenFabrics Workshop in Austin. “Persistent Memory exhibits several interesting characteristics including persistence, capacity and others. These (sometimes)competing characteristics may require system and server architects to make tradeoffs in system architecture. In this session, we explore some of those tradeoffs and take an early look at the emerging use cases for Remote Persistent Memory and how those may impact network architecture and API design.”

Accelerating TensorFlow with RDMA for High-Performance Deep Learning

Xiaoyi Lu from Ohio State University gave this talk at the 2019 OpenFabrics Workshop in Austin. “Google’s TensorFlow is one of the most popular Deep Learning (DL) frameworks. We propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMAgRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance.”

The State of High-Performance Fabrics: A Chat with the OpenFabrics Alliance

In this special guest feature, Paul Grun and Doug Ledford from the OpenFabrics Alliance describe the industry trends in the fabrics space, its state of affairs and emerging applications. “Originally, ‘high-performance fabrics’ were associated with large, exotic HPC machines. But in the modern world, these fabrics, which are based on technologies designed to improve application efficiency, performance, and scalability, are becoming more and more common in the commercial sphere because of the increasing demands being placed on commercial systems.”

Call for Participation: OFA Workshop in Austin

The OpenFabrics Alliance has issued their Call for Participation for the 2019 OFA Workshop. The event takes place March 20-21 in Austin, Texas. “The annual OFA Workshop is a premier means of fostering collaboration among those who develop fabrics, deploy fabrics, and create applications that rely on fabrics. It is the only event of its kind where fabric developers and users can discuss emerging fabric technologies, collaborate on future industry requirements, and address problems that exist today.”

OFA Expands Mission to Boost Development of Advanced Network and Fabric Technologies

Today the OpenFabrics Alliance (OFA) unveiled an expanded mission to accelerate the development and adoption of advanced fabric technologies. This is a significant expansion of its original mission from 2004, which was to facilitate the rapid adoption of an emerging network technology, known as the InfiniBand Architecture. The new mission expands its scope to include software for the entirety of the advanced networks landscape, including the InfiniBand Architecture. ”
The 15th Annual OFA Workshop, is returning to Austin, Texas – March 19-21, 2019 at the University of Texas at Austin.”

High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD

Xiaoyi Lu from OSU gave this talk at the 2018 OpenFabrics Workshop. “The convergence of Big Data and HPC has been pushing the innovation of accelerating Big Data analytics and management on modern HPC clusters. Recent studies have shown that the performance of Apache Hadoop, Spark, and Memcached can be significantly improved by leveraging the high-performance networking technologies, such as Remote Direct Memory Access (RDMA). In this talk, we propose new communication and I/O schemes for these data analytics stacks, which are designed with RDMA over NVM and NVMe-SSD.”

The OpenFabrics Alliance 2018 Annual Workshop Recap

“The 14th Annual OpenFabrics Alliance (OFA) Workshop, held in scenic Boulder, Colorado, recently concluded its week-long, community-wide collaboration and dialogue on OpenFabrics. As the premier means of fostering lively discussions among those who develop fabrics, deploy fabrics, and create applications that rely on fabrics, the Workshop is the ideal venue for the OpenFabrics community and networking industry at large to identify and address the wide variety of emerging industry requirements and challenges that remain.”

Accelerating Ceph with RDMA and NVMe-oF

Haodong Tang from Intel gave this talk at the 2018 Open Fabrics Workshop. “Efficient network messenger is critical for today’s scale-out storage systems. Ceph is one of the most popular distributed storage system providing a scalable and reliable object, block and file storage services. As the explosive growth of Big Data continues, there’re strong demands leveraging Ceph build high performance & ultra-low latency storage solution in the cloud and bigdata environment. The traditional TCP/IP cannot satisfy this requirement, but Remote Direct Memory Access (RDMA) can.”

Amazon and Libfabric: A case study in flexible HPC Infrastructure

Brian Barrett from Amazon gave this talk at the 2018 OpenFabrics Workshop. “As network performance becomes a larger bottleneck in application performance, AWS is investing in improving HPC network performance. Our initial investment focused on improving performance in open source MPI implementations, with positive results. Recently, however, we have pivoted to focusing on using libfabric to improve point to point performance.”