In this video, Gilad Shainer from the InfiniBand Trade Association describes how InfiniBand offers the optimal interconnect technology for Ai, HPC, and Exascale.
Transcript:
insideHPC: Hi, I’m Rich with insideHPC. We’re here at the Stanford HPC Conference. And I’m here with Gilad Shainer from the InfiniBand Trade Association. Gilad, you work in a number of capacities including this HPC Advisory Council. But first, I want to ask you what is the IBTA and what’s it for?
Gilad Shainer: Well, the IBTA, it’s a consortium that was established in order to– in the beginning to create the specification for InfiniBand standard network because we all know that powers some of the largest supercomputers in the world and the leading Ai infrastructures as well. Several years ago, IBTA also took on themselves to create an organization for RoCE. So actually, everything RDMA is being specified within the IBTA organization.
insideHPC: Well, we’re coming soon onto this age of exascale. Two, three years from now, we’re going to have exascale machines. And now, they’re bringing in Ai into the mix. Gilad, can you tell me how does InfiniBand help with these giant exascale machines?
Gilad Shainer: Yes. So InfiniBand was born in 1999. It’s basically the ultimate software-defined network. And because it was born as a technology that enables to configure everything in a high, flexible way, InfiniBand enables to build any sort of infrastructure at any size with the same simple building blocks. So it provides the best performance in the sense of lowest latency, high bandwidth, and flexibility to build different kind of topologies that can meet requirements for different workloads and as well to enable heading more and more acceleration engines into the fabric in a very simple way that you cannot do in other networks.
Now, in the recent years, there was a lot of effort to start bringing in network computing, get the built-ins into InfiniBand which actually enables to analyze data wherever the data is, right? If you look on where we live today, which is the world of data and definitely impacts the HPC simulations and the complexity of things that we can simulate and analyze in order to make it effective and in order to solve the big problems, you cannot continue and drive data to the compute, right? It doesn’t work anymore. You hit the performance bottlenecks. It’s become too expensive to do that sort of things. And then, recent years, we’re moving to more data-centric architecture and data center which mean you want to move computer to data. And as part of that move, InfiniBand, because of the flexibility of the architecture, enabled to bring acceleration engines that help to analyze the data wherever the data is. Now, that results in enabling much better performance, overcoming performance bottlenecks, and enabling much better performance for both high-performance computing and Ai workloads which both of them actually share the same requirements.
insideHPC: So in the specific area of Ai, how is InfiniBand helping? It seems like these are very data-intensive things of applications. Are you computing the data where it is versus moving it back to a central source? Is that the idea?
Gilad Shainer: Correct. I think there are three or four main things that you need to achieve in order to be able to bring those Ai workloads or to create the algorithms that you can you use to find better insights and information that you collect. First one is through Ai, you need the biggest pipes in order to move those giant amount of data in order to create those Ai software algorithms. That’s one thing. Latency is important because you need to drive things faster. RDMA is one of the key technology that enables to increase the efficiency of moving data, reducing CPU overhead. And by the way, now, there’s all of the Ai frameworks that exist out there, supports RDMA as a default element within the framework itself.
And the last part which will be coming very, very critical is the technology that called SHARP which is scalable, hierarchical, aggregation, and reduction protocol. SHARP is part of the in-network computing capabilities. It’s a technology that actually enable to analyze data wherever the data is. And what SHARP does in a very simple description. It’s enables to do data aggregation on data reduction on the network level instead of moving the data all the way to the endpoint before it can do that operation. And SHARP enables to reduce latencies dramatically. So if a data aggregation process it takes tens of microseconds so even get closer to hundreds of microseconds by withdrawing that on the software side or on the CPU side, you can actually reduce that time to four microseconds when you run it on the network. So that’s a dramatic reduction of time and even more than that Ai or this CPU deep learning frameworks, it’s all about of data reduction. Data reduction is the main part of it. And today, the frameworks are using elements or entities that called parameter servers that work. And actually, by bringing SHARP into the network, it replaces a lot of servers by doing the same aggregation element within the network and doing that much, much faster. So we start doing some first testing on one intensive flow. For example, utilizing the SHARP capability. And we’re seeing 20 and above percent of performance improvement for those Ai workloads.
insideHPC: Gilad, it’s exciting to see this come together and looking forward to seeing exascale Ai coming in the near future.
Gilad Shainer: Well, there is a lot of activity now in the world, and I think that the HPC market is fueled by the rest of exascale worldwide. And InfiniBand is actually one of the leading network options toward exascale.
I’ll give you one example. The recent InfiniBand technology out there is HDR InfiniBand that drives 200 gigabyte per second. And because it’s flexible, it can build different types of topologies. You actually set it up yourself or make yourself ready for the exascale. So today, if you’re taking the HDR technology with just three hops between switch infrastructures, you can build a compute data centers that can go with just those three hops to connect more than 160,000 endpoints, right? And then, if you look on the upcoming generation of InfiniBand which is NDR InfiniBand that when they’re ready to go 400 gigagyte per second per port, that will enable to connect more than a million endpoints with just three switch-offs connections, all right? So it’s a technology that enables all the large supercomputers of today and that’s the best technology that could enable the exascale generation