Accelerating Apache Spark with RDMA

Print Friendly, PDF & Email

Yuval Degani is a Senior Software R&D Manager – Big Data and Machine Learning Network Acceleration at Mellanox

In this video from the OpenFabrics Workshop, Yuval Degani from Mellanox presents: Accelerating Apache Spark with RDMA.

“Apache Spark is today’s fastest growing Big Data analysis platform. Spark workloads typically maintain a persistent data set in memory, which is accessed multiple times over the network. Consequently, networking IO performance is a critical component in Spark systems. RDMA’s performance characteristics, such as high bandwidth, low latency, and low CPU overhead, offer a good opportunity for accelerating Spark by improving its data transfer facilities.”

“In this talk, we present a Java-based, RDMA network layer for Apache Spark. The implementation optimized both the RPC and the Shuffle mechanisms for RDMA. Initial benchmarking shows up to 25% improvement for Spark Applications.”

Yuval Degani is a Senior Software R&D Manager – Big Data and Machine Learning Network Acceleration at Mellanox. He is in-charge of accelerating Big Data and Machine Learning frameworks with Mellanox broad technology offering in hardware and software. Leading the design and implementation of high performance hyperscale solutions from the hardware level up to and including the application level. Focused on Apache Spark, Apache Hadoop, TensorFlow and more.

See more talks in the OpenFabrics Workshop Video Gallery

Sign up for our insideHPC Newsletter