How are HPC networks programmed?

Practically every HPC network has its own lower-level programming interface. Myrinet has both MX (Myrinet Express) and GM (Glenn’s Messages). QsNet has Elan and the associated Tports. SCI has SISCI (Software Infrastructure for SCI). VIA has VIPL (VI Provider Library). InfiniBand and iWARP do not have standard interfaces, but the Open Group has defined the portable RDMA-Capable NIC Programming Interface (RNIC-PI), intending it to be used in place of vendor-specific verb implementations.

Obviously, software written directly for a lower-level interface is not portable. Therefore, it is beneficial to have an upper-layer protocol (ULP) to support applications. An early transport-independent API is the user Direct Access Programming Library (uDAPL), defined by the DAT Collaborative. This ULP provides much of the same functionality of VIPL, but substitutes notions like queue pairs and virtual interfaces with the more generic “asynchronous communication.” Additionally, uDAPL provides some quality of service controls as this is a feature in InfiniBand.

Borrowing heavily from uDAPL is the Open Group’s Interconnect Transport API (IT-API), which includes unreliable datagram communication, another InfiniBand feature.

While these new APIs may be suitable for new applications, there is plenty of existing software that uses Sockets. To maintain backwards-compatibility with legacy applications, both InfiniBand and iWARP define the Sockets Direct Protocol (SDP). SDP is a byte-stream oriented transport protocol (SOCK_STREAM) similar to TCP, but allows for exploitation of RDMA devices. Each socket in SDP corresponds to a single queue pair.

In some SDP implementations, the software may only require a re-link to use the new interconnects. For better exploitation though, the Open Group has defined the Extended Sockets API (ES-API) with functions that handle asynchronous communication on RDMA networks. This API merely adds a few new subroutines to traditional Sockets.

Most software today is written using Sockets or uDAPL (or their Open Group extensions). Only for performance reasons is software written directly for a network; that choice would be like building an implementation with assembly rather than C.

For more information, see the Open Group’s Interconnect Software Consortium.