Infiniband and Exascale. An interview with Gilad Shainer, Mellanox

Print Friendly, PDF & Email

We caught up with Gilad Shainer from Mellanox to learn more about how InfiniBand might play in the exascale arena.

The Exascale Report: InfiniBand for Exascale – do you see an option for Exascale computing to use InfiniBand?

Shainer: InfiniBand was designed to be a high-performance, low-latency, and efficient solution for connecting servers and storage. InfiniBand is a standard interconnect with specifications continually developed by the InfiniBand Trade Association (IBTA). One of its advantages is the rapid development of the specification to adjust to new demands and new developments around high-performance computing technologies. As a proof point, we see new bandwidth speeds produced by the InfiniBand vendors every 2-3 years. The time from specification-to-product is small and this enables InfiniBand to become a leading solution in the HPC arena. On the TOP500 supercomputers list, InfiniBand connects 61% of the top 100 systems and 5 of the 10 sustain Petaflop performance. One of the goals of the IBTA is to pave the road to Exascale by delivering constant improvements for performance, scalability and reliability within the network. InfiniBand delivers capabilities today that are superior to other technologies, both proprietary and standard, and it does it in the most efficient and economical way. One of the most important items for Exascale, besides the performance of course, will be the affordability. We need to have economical solutions for the Exascale machines.

TER: Paving the road to Exascale is a great theme – but what has been done at the InfiniBand specification level to actually fulfill this promise?

Shainer: The latest release of InfiniBand is FDR (Fourteen Data Rate, 14Gb/s data rate per lane) 56Gb/s InfiniBand. FDR is the next generation InfiniBand technology specified by the IBTA. InfiniBand lane speeds continue to increase to support end-user demands for improved return-on-investment and performance benefits, as well as robust network capabilities to support multi-core processors and accelerators. FDR 56Gb/s InfiniBand introduces several enhancements for higher performance, scalability and reliability for performance demanding data centers.

1. Network Bandwidth

The FDR InfiniBand Link speed has increased to 14Gb/s per lane, or 56Gb/s per 4 lane port (a typical InfiniBand implementation), a data rate increase of more than 70 percent compared to previous InfiniBand generations.

2. Network Latency

FDR InfiniBand interconnect solutions accelerate data delivery with reduced fabric latency. The reduction in latency enables faster communication and synchronization between application processes and increases the cluster performance and the overall return-on-investment.

3. Network Efficiency

The link encoding for FDR InfiniBand was modified from 8bit/10bit used in SDR, DDR and QDR InfiniBand to 64bit/66bit. This allows higher network efficiency for data center server and storage connectivity by reducing the ration between control bits and data bits sent in the network. With FDR InfiniBand, the network spends more time on actual data delivery between application job processes compared to SDR, DDR, and QDR, which in turn increases the overall network productivity.

4. Network Reliability and Data Integrity

InfiniBand provides a scalable and reliable high-speed interconnect for servers and storage. For data integrity and guaranteed reliable data transfer between end-nodes (servers and storage), InfiniBand uses an end-to-end hardware reliability mechanism. Each InfiniBand packet contains two Cyclic Redundancy Checks (CRCs). The Invariant CRC (ICRC) covers all fields which do not change as the packet traverses the fabric. The Variant CRC (VCRC) covers the entire packet. The combination of the two CRCs allows switches and routers to modify appropriate fields and maintain end-to-end data integrity. If a data corruption occurs due to Bit Error Rate (BER), the packet will be discarded by the switch or the adapter, and will be re-transmitted by the source to the target. In order to accelerate the data retransmission, a new mechanism was added to FDR InfiniBand – Forward Error Correction (FEC). FEC allows the InfiniBand devices (adapters and switches) to fix bit errors throughout the network and reduce the overhead for data re-transmission between the end-nodes. The FDR InfiniBand FEC mechanism utilizes redundancy in the 64/66-bit encoding to enable error correction with no bandwidth loss and has the ability to work over each link independently, on each of the link lanes. The new mechanism delivers superior network reliability especially for the large scale data centers, high-performance computing or Web 2.0 centers, and delivers a predictable low-latency characteristic, critical for large scale applications and synchronizations.

TER: The software interface is no less important than the bandwidth and latency – does the InfiniBand software interface enables scalable computing?

Shainer: The InfiniBand specification defines not only the physical, link and transport layers, but also the software interface – a rich and flexible one – the InfiniBand verbs. The InfiniBand verbs interface provides the capability to open secure connections between processes, to use communications semantics of send/receive, write/read and atomics, and greatly fits environments of MPI (send/receive) or PGAS (write/read or put/get). InfiniBand also provides several transport services, and a complete capability to move data from one process memory

to another without the involvement of the host CPU. This is a critical capability to ensure the lowest CPU overhead (meaning higher CPU efficiency and lower power consumption) as well as the ability to create direct communication between other compute elements.

We have seen many proprietary software interfaces that were developed and abandoned over time due to missing elements or their proprietary nature. Over time, new commands have been added to InfiniBand to add new capabilities of memory management, etc. We do expect the InfiniBand specification to continue and evolve over time and to provide the needed capability for any programming interface that will be required for the Exascale era.

TER: What is your response to those who say Infiniband is not the answer for exascale?

Shainer: InfiniBand was the answer to both TeraScale and to PetaScale. Its performance does not fall compared to any other interconnect solution, and furthermore, its development is faster than any other interconnect technology. If any of the current technologies is a candidate, InfiniBand would be the lead one. One can claim that no technology today meets the Exascale requirements, and new technologies are certainly needed. Rather than invest money in new development, take something that already has a full hardware and software eco-system and add the relevant things needed for Exascale. It will be the most cost-effective and fastest way to Exascale. This is what the IBTA is currently working on today.

For related stories, visit The Exascale Report Archives.