Bitfusion Enables InfiniBand-Attached GPUs on Any VM

Print Friendly, PDF & Email

This week Bitfusion unveiled a new reference solution architecture combining the company’s FlexDirect with VMWare and Mellanox platforms for attaching GPUs to any virtual machine over the network.

With Bitfusion along with Mellanox and VMWare, IT can now offer an ability to mix bare metal and virtual machine environments, such that GPUs in any configuration can be attached to any virtual machine in the organization, enabling easy access of GPUs to everyone in the organization,” said Subbu Rama, co-founder and chief product officer, Bitfusion. “IT can now pool together resources and offer an elastic GPU as a service to their organizations.”

With the new reference solution, GPU accelerators can now be part of a common infrastructure resource pool and available for use by any virtual machine in the data center in full or partial configurations, attached over the network. The solution works with any type of GPU server and any networking configuration such as TCP, RoCE or InfiniBand. It also leverages Bitfusion’s FlexDirect to remotely attach GPUs over the network as well as create fractional GPUs.

Mellanox and Bitfusion set the infrastructure configuration as shown in Figure 1 to emulate a real-life Elastic AI Infrastructure. The test bed included a cluster of Dell R740 GPU servers and Dell R640 CPU servers (no GPUs), Mellanox SN2700 100GbE switch and Mellanox ConnectX5 cards. On the clients, VMWare VSphere ESX 6.5 was setup along with Ubuntu 16.04 for the VM operating system, CUDA 9.1, CuDnn 7.3 and TensorFlow 1.9.

Bitfusion FlexDirect runs in the user space and doesn’t require any changes to the OS, drivers, kernel modules or AI frameworks. It’s worth noting that FlexDirect can also support a heterogeneous cluster with hybrid operating systems, so for instance a cluster can have FlexDirect client run on, say, Ubuntu, and have that connect to a FlexDirect server on, say, CentOS (and vice versa).

Sign up for our insideHPC Newsletter