Mellanox officially outs NVIDIA GPUDirect, NVIDIA says only the first step

Print Friendly, PDF & Email

nVidia logoWhen I reported on the PFLOPS super that Chinese Academy of Science has built with NVIDIA GPUs (the new Fermi-enabled Teslas), I mentioned it was using NVIDIA’s GPUDirect technology to improve the performance of GPU-to-GPU transfers. Mellanox is the lead partner with NVIDIA in developing that technology, which according to information I got in conversations with NVIDIA yesterday involved changes to the Linux kernel (done by NVIDIA) and NVIDIA’s drivers as well as Mellanox’s HCA driver. Nota bene: NVIDIA has since told me they are working on contributing those kernel changes back to the community, but in the meantime there is a patch.

Yesterday Mellanox talked a little more about GPUDirect

Mellanox logoMellanox…announced the immediate availability of NVIDIA GPUDirect technology with Mellanox ConnectX®-2 40Gb/s InfiniBand adapters that boosts GPU-based cluster efficiency and increases performance by an order of magnitude over today’s fastest high-performance computing clusters.

Today’s current architecture requires the CPU to handle memory copied between the GPU and the InfiniBand network. Mellanox was the lead partner in the development of NVIDIA GPUDirect, a technology that reduces the involvement of the CPU, reducing latency for GPU-to-InfiniBand communication by up to 30 percent. This communication time speedup can potentially add up to a gain of over 40 percent in application productivity when a large number of jobs are run on a server cluster. NVIDIA GPUDirect technology with Mellanox scalable HPC solutions is in use today in multiple HPC centers around the world, providing leading engineering and scientific application performance acceleration.

This is way cool, but NVIDIA says that its only first step in a line of changes that will bring improved performance to devices that hang off the PCI-e bus. For example, an SSD connected via the PCI-e could send its data directly to the GPU using this technology instead of having to go through the CPU, potentially dramatically speeding up data transfer times. This is becoming ever more important as compute nodes continue to get more powerful at a faster rate than IO channels can feed them data to work on.


  1. I’m a little confused by your SSD comment. Surely, since GPUDirect uses Mellanox’ 40GB/s IB, this is a high-speed connection directly between Tesla cards, bypassing PCIe. Peer-to-peer transfers over PCIe, like from SSD to a Tesla card, have been around for a while. (Not necessarily used, but possible.)

  2. John West says

    Greg – have you checked out the video briefing I linked to in the China post ( )? The approach for this case is outlined starting around 5:48, and going through about minute 8. As far as I can tell, there isn’t a direct connection (no physical cable, for example) between the GPU and the HCA; it’s just that both devices can now have direct access to the same pinned memory area so that the CPU itself doesn’t have to be directly involved in the transfer (and no buffer copy is needed).

  3. Hello,

    Greg, you were saying “Peer-to-peer transfers over PCIe, like from SSD to a Tesla card, have been around for a while”. May I ask you one could achieve Peer-to-peer transfers over PCIe from say a FGPA card and a Tesla?