How InfiniBand is Powering new capabilities for Machine Learning with RDMA

Print Friendly, PDF & Email

In this video from GTC 2017, Scot Schultz from Mellanox describes how high performance InfiniBand is powering new capabilities for Machine Learning with RDMA.

“Mellanox Solutions accelerate many of the world’s leading artificial intelligence and machine learning platforms. Machine learning is a pillar of today’s technological world, offering solutions that enable better and more accurate decision making based on the great amounts of data being collected. Machine learning encompasses a wide range of applications, ranging from security, finance, and image and voice recognition, to self-driving cars, healthcare and smart cities. Mellanox solutions enable companies and organizations such as Baidu, Facebook,, NVIDIA, PayPal, Tencent, Yahoo and many more to leverage machine learning platforms to enhance their competitive advantage.”

Machine learning applications are based on training a deep neural network, which requires complex computations and fast and efficient data delivery. Mellanox solutions enable smart offloading such as RDMA and GPUDirect, and In-Network Computing capabilities that dramatically improve neural network training performance and overall machine learning applications.

Mellanox has accelerated popular frameworks such as TensorFlow, Paddle, Caffe and Apache Spark with RDMA, and continues to innovate and accelerate solutions for the quick and scalable distributed execution of training large and powerful models. By providing low latency, high bandwidth, high message rate, and smart offloads, Mellanox interconnect solutions are the most deployed high-speed interconnect for large-scale machine learning, for both training and inferencing systems. Utilizing Mellanox technology, Yahoo has demonstrated 18X speedup for image recognition; Tencent has been able to achieve world record performance for data sorting; PayPal has been able to detect fraud in real time; and NVIDIA has incorporated Mellanox solutions inside their machine learning DGX1 appliance in order to provide 400Gb/s data throughput, and to build the most power-efficient machine learning supercomputer.”

Sign up for our insideHPC Newsletter