Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


2020 OpenFabrics Alliance Workshop – Video Gallery

Welcome to the 2020 OpenFabrics Workshop video gallery.

The OpenFabrics Alliance (OFA) is focused on accelerating development of high performance fabrics. The annual OFA Workshop, held in virtual format this year, is a premier means of fostering collaboration among those who develop fabrics, deploy fabrics, and create applications that rely on fabrics. It is the only event of its kind where fabric developers and users can discuss emerging fabric technologies, collaborate on future industry requirements, and address problems that exist today.

An FPGA Platform for Reconfigurable Heterogeneous HPC and Cloud Computing, Bernard Metzler, IBM Research Zurich * VIDEO
Dr. Bernard Metzler is a Principal Research Staff Member and Technical Leader at IBM Zurich Research Laboratory. His main research interests are in enhancing network and storage IO of distributed systems, and the integration of modern high performance IO hardware with distributed applications. He contributes to the design, standardization and implementation of IO subsystem components, such as network protocols, storage stacks, and APIs.

An Update on CXL Specification Advancements, Jim Pappas, CXL Consortium * VIDEO
Jim Pappas is Director of Technology Initiatives at Intel, with responsibility to establish broad industry ecosystems that comply with new technologies in the areas of enterprise I/O, energy efficient computing, solid state storage and persistent memory. Jim has founded or served at several organizations in these areas including PCI-SIG, USB, SNIA, IBTA, OFA, The Green Grid (TGG), Compute Express Link™ (CXL). Jim has over 30 years’ experience in the computer industry and holds eight U.S. patents in computer graphics and microprocessor technologies. He holds a B.S.E.E. from the University of Massachusetts, Amherst.

Designing a Deep-Learning Aware MPI Library: An MVAPICH2 Approach, Dhabaleswar Panda, The Ohio State University * VIDEO
DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group, are currently being used by more than 3,075 organizations worldwide (in 89 countries). More than 757,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 3rd, 5th, 8th, 14th, 15th, and 18th ranked ones) in the TOP500 list. Prof. Panda is an IEEE Fellow.

Distributed Asynchronous Object Storage (DAOS), Kenneth Cain, Intel, Corp. * VIDEO
Ken Cain is a software developer in Intel’s HPC organization in the Cloud and Enterprise Solutions Group. He is contributing to the development of DAOS, a very large-scale distributed storage solution. His experience in high performance networking includes switch and fabric management software, high performance host interfaces and communication middleware, and spans HPC, traditional Ethernet, and embedded systems interconnects. Before joining Intel, he contributed on teams in both research and commercial systems provider organizations.

Enhancing NVMe and NVMe-oF configuration and managability with SNIA Swordfish and DMTF Redfish to
Enable Scalable Infrastructures,Phil Cayton, Intel, Corp.; Rajalaxmi Angadi, Intel, Corp.; Richelle Ahlvers, Broadcom * VIDEO
Phil Cayton is Senior Staff at Intel Corporation, with 25 years’ experience developing and researching non-volatile local and remote storage and fabrics technologies, particularly NVMe, NVMe-oF, NVMe-MI, InfiniBand and iWARP architectures resulting in 25+ patents. Both Rajalaxmi and Phil have been heavily involved in development of schemas and mockups for SNIA swordfish in the Scalable Storage Management Technical Working group; they have authored multiple technical proposals and contributed to the NVM Express consortium.
Rajalaxmi Angadi is a Senior Engineer at Intel Corporation, she has over 14+ years of industry experience, architecting, developing, performance analysis of storage systems and fabrics technologies including NVMe and NVMe-oF.
Richelle Ahlvers is Storage Management Software Architect at Broadcom Inc., where she defines storage management integrations, solutions, and standards strategies for the Data Center Storage Group and on the SNIA Board of Directors. Ahlvers has spent over 25 years in Enterprise Storage, leading the architecture, design and development of storage array and management software.

Enhancing OFI for Invoking Acceleration Capabilities on an Integrated Networking/Accelerator FPGA
Platform (COPA), Venkata Krishnan, Intel, Corp. * VIDEO
Venkata Krishnan focuses on various aspects of microprocessor architecture, accelerators and networking. He holds a B.Tech from IIT Madras and a PhD from University of Illinois at Urbana-Champaign in Computer Science. Prior to joining Intel, he has held other positions that include CTO of Dolphin Interconnect, research scientist at DE Shaw research and fellow at AMD.

Gen-Z: An Open Memory Fabric for Future Data Processing Needs, Russ Herrell, Gen-Z Consortium * VIDEO
Russ Herrell is a Distinguished Technologist and has been with HPE for almost 38 years.  He has a Master’s in Electrical Engineering from Montana State University and has worked at HPE ever since leaving campus.  Russ has designed ECC memory boards, 3D graphics accelerators, virtual DMA interfaces, interrupt driven flow controlled, load/store graphics library interfaces, and 16-64 socket SMP big iron.  Most recently Russ has been working to enable the Gen-Z and Memory Driven Compute ecosystem.

How Do We Debug?, Ariel Almog, Mellanox Technologies (Speaker); Alex Rosenbaum, Mellanox Technologies; Tzahi Oved, Mellanox Technologies * VIDEO
Ariel Almog is a senior staff software architect at Nvidia (formerly Mellanox) where he helps define software architecture for networking features, including Ethernet, RDMA and InfiniBand over various OSs,. He has worked at Mellanox for the last 8 years. Prior to Mellanox, he worked on access networks and IP phones, handling ancient SS7 switches and some extinct technologies, such as ATM.

Lustre Network Multi-Rail Feature Set, Amir Shehata, Whamcloud, DDN * VIDEO
Amir Shehata has been working on Lustre networking for seven years. He implemented multiple features including dynamic configuration and the multi-rail feature set among others.

Meet the 2020 OFA Workshop TPC, Jim Ryan, OFA; Paul Grun, HPE * VIDEO
As a high-tech industry veteran, Jim Ryan brings more than 35 years of experience in developing and managing Special
Interest Groups. He has experience in software development, software product management and high-performance data center networks. Jim formed the OpenFabrics Alliance to develop and promote interoperable software stacks for Remote Direct Memory Access interconnects for HPC and enterprise data centers. Since 2000, he has served as an initiative manager at Intel, where he also formed the SSI Forum to drive server infrastructure standards for blade servers, power supplies and server boards and enclosures. Prior to joining Intel, he spent time at Sequent (IBM), Bank of America and other high-tech online database companies. Ryan completed his education with an MBA from U.C. Berkeley.
Paul Grun, Chair, Open Fabrics Alliance, is a senior technologist at HPE. During his 40 year career he has been involved in all aspects of server I/O beginning with storage for large mainframe systems, turning to high performance network architecture and now focusing on applying I/O technology to building large scale systems at Cray. His association with advanced networking technology goes back to the genesis of InfiniBand when as a member of Intel’s Server Architecture Lab he contributed to the creation of high performance networks, going on to represent Intel to the InfiniBand Trade Association (IBTA). There he has served as chair of the Technical Working Group, as chair and principle author for the RoCE (RDMA over Converged Ethernet) specification, and has served on the IBTA’s Steering Committee. He is OFA’s Chair and Co-Chair of the OpenFabrics Interfaces Working Group.

MVAPICH Touches the Cloud: New Frontiers for MPI in High Performance Clouds, Dhabaleswar Panda, The Ohio State University * VIDEO
DK Panda is an IEEE Fellow and a professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group, are used by more than 3,075 organizations worldwide (in 89 countries). More than 757,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 3rd, 5th, 8th, 14th, 15th, and 18th ranked ones) in the TOP500 list.

oneAPI, oneCCL and OFI: Path to Heterogeneous Architecture Programming with Scalable Collective Communications, Sayantan Sur, Intel, Corp. * VIDEO
Sayantan Sur is a Principal Engineer at Intel. He is an expert in MPI, fabrics and HPC. He is currently focusing on the oneAPI stack for Intel’s Xe GPUs. Sayantan is well known in the fabrics community for his work in scaling MPI implementations over InfiniBand. His work enabled MPI to scale to thousands of InfiniBand connected nodes. He has published more than 30 papers in major conferences and delivered tutorials at conferences, such as Supercomputing and Hot Interconnects. He received his Ph.D. from The Ohio State University in 2007.

RDMA with GPU Memory via DMA-Buf, Jianxin Xiong, Intel, Corp. * VIDEO
Jianxin Xiong is a Software Engineer at Intel. For 15+ years he has worked on various layers of interconnection software stack, such as RDMA drivers in Linux kernel, RDMA device virtualization, Open Fabric Interface, DAPL, Tag Matching Interface, and Intel MPI. His current focus is GPU/accelerator scale-out with RDMA devices.

Remote Persistent Memory Access API – The Second Approach, Tomasz Gromadzki, Intel, Corp.; Jan Michalski, Intel, Corp.
VIDEO
Tomasz Gromadzki is a software architect at Intel’s Memory & Storage Product Group. His focus is on remote persistent memory access, including proper integration of persistent memory with other (networking) technologies as well as optimal persistent memory replication procedures and algorithms. Before joining Intel in 2018, for over 20 years, Tomasz designed, implemented, and deployed a variety of communication solutions for power distribution, industrial, and mining automation systems. He holds a Master of Science in Computer Science from the Gdansk University of Technology, Poland.
Jan Michalski is a software engineer in Intel’s Memory & Storage Products Group. He focuses remote persistent memory access, which includes proper integration of persistent memory with other technologies, as well as looking for optimal persistent-memory replication procedures and algorithms. He holds a master’s degree in computer engineering from the Gdańsk University of Technology, Poland, where he studied system software engineering.

SparkUCX – RDMA Acceleration Plug-in for Spark, Peter Rudenko, Mellanox Technologies * VIDEO
Peter Rudenko is a software engineer inthe  Mellanox HPC team focusing on accelerating data intensive applications, developing UCX communication library and various big data solutions.

SPDK-based User Space NVMe over TCP Transport Solution, Ziye Yang, Intel, Corp. * VIDEO
Ziye Yang is a cloud software engineer at Intel and involved in SPDK (storage performance development kit) development work. Before that, Yang worked at EMC for 4.5 years. Yang is interested in system virtualization, file system and storage related research and development work. Ziye currently has 15 issued patents in US and 7 issued patents in PRC and holds a master degree in computer science from Fudan University in 2009.

Status of OpenFabrics Interfaces (OFI) Support in MPICH, Yanfei Guo, Argonne National Laboratory * VIDEO
Dr. Yanfei Guo holds an appointment of Assistant Computer Scientist at the Argonne National Laboratory. He is a member of the Programming Models and the Runtime Systems Group. His research interests include parallel programming models and runtime systems in extreme-scale supercomputing systems, data-intensive computing and cloud computing systems. He received the best paper award at the USENIX International Conference on Autonomic Computing 2013 (ICAC13). His work on programming models and runtime systems has been published on peer-reviewed conferences and journals including the ACM/IEEE Supercomputing Conference (SC14, SC15) and IEEE Transactions on Parallel and Distributed Systems (TPDS).

Toward an Open Fabric Management Architecture, Russ Herrell, HPE * VIDEO
Russ Herrell is a Distinguished Technologist and has been with HPE for almost 38 years.  Russ has a Master’s in Electrical Engineering from Montana State University and has worked at HPE since.  Russ has designed ECC memory boards, 3D graphics accelerators, virtual DMA interfaces, interrupt driven flow controlled, load/store graphics library interfaces, and 16-64 socket SMP big iron.  Most recently Russ has worked to enable the Gen-Z and Memory Driven Compute ecosystem.

TriEC: An Efficient Erasure Coding NIC Offload Paradigm Based on Tripartite Graph Model, Xiaoyi Lu, The Ohio State University * VIDEO
Dr. Xiaoyi Lu is a Research Assistant Professor in the Department of Computer Science and Engineering at the Ohio State University. His research interests include parallel computing, high-performance interconnects and protocols, big data analytics, cloud computing and deep learning system software. He has published more than 100 papers in international conferences, workshops and journals with multiple awards and nominations. Many of Dr. Lu’s research outcomes (e.g, HiBD, MVAPICH2-Virt, DataMPI, RDMA-TensorFlow, NeuroHPC) are made publicly available to the community. He is a member of IEEE and ACM.

Using Libfabric for Scalable Distributed Machine Learning: Use Cases, Learnings and Best Practices, Rashika Kheria, Amazon * VIDEO
Rashika Kheria is a Senior Engineer at Amazon Web Services, where she has worked on performant and scalable networking and storage solutions to EC2 customers. Most recently, she’s been focused on enabling customers to run large scale Machine learning applications in a performant way. Before this, she’s worked on developing and providing Amazon Linux Kernel and Hypervisor to millions of customers, and contributing patches to the Linux Kernel where she was 4th most active developer in 2014. She works with open source software.

Using SPDK to Optimize Your NVMe-oF RDMA Stack, Seth Howell, Intel, Corp.; Alexey Marchuk, Mellanox Technologies * VIDEO
Howell graduated from Arizona State University with a bachelors degree in computer systems engineering. He has worked at Intel Corporation since 2017 on the SPDK project, starting with continuous integration and automated testing. Much of his development time has focused on the NVMe-oF feature. He also manages library architecture and versioning.
Marchuk works at Mellanox on the SPDK project and he is a member of SPDK core maintainers team. His focuses on RDMA and NVMEoF optimizations. He has experience in network protocols and in distributed machine learning, and he participated in the development of Intel oneCCL library.

Visualize and Analyze Your Network Activities using OSU INAM, Hari Subramoni, The Ohio State University * VIDEO
Dr. Hari Subramoni received the Ph.D. degree in computer science from The Ohio State University in 2013. He is a research scientist there in the Department of Computer Science and Engineering. His interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, deep learning, big data and cloud computing. He has
published over 50 papers in international journals and conferences. He is working on the design and development of MVAPICH2, MVAPICH2-GDR, MVAPICH2-X, and OSU INAM software packages. He is a member of IEEE.

 

Resource Links: