Working with Big Data can bog down even the fastest system networks. With a mission to speed up analytics, the High-Performance Big Data (HiBD) team at Ohio State University has released RDMA-Apache-Spark 0.9.1.
HiBD packages are being used by more than 135 organizations worldwide in 20 countries (Current Users) to accelerate Big Data applications. As of Jan ’16, more than 14,450 downloads have taken place from this project’s site.
The HiBD project contains the following packages:
- RDMA-based Apache Spark (RDMA-Spark) (NEW)
- RDMA-based Apache Hadoop 2.x (RDMA-Hadoop-2.x)
- RDMA-based Apache Hadoop 1.x (RDMA-Hadoop-1.x)
- RDMA-based Memcached (RDMA-Memcached)
- OSU HiBD-Benchmarks (OHB)
RDMA-Apache-Spark 0.9.1 Features
- Based on Apache Spark 1.5.1
- High performance design with native InfiniBand and RoCE support at the verbs-level for Spark
- RDMA-based data shuffle
- SEDA-based shuffle architecture
- Efficient connection management and sharing
- Non-blocking and chunk-based data transfer
- Off-JVM-heap buffer management
- Compliant with Apache Spark 1.5.1 APIs and applications
- Easily configurable for native InfiniBand, RoCE, and the traditional sockets based support (Ethernet and InfiniBand with IPoIB)
- Tested with
- Mellanox InfiniBand adapters (DDR, QDR, and FDR)
- RoCE support with Mellanox adapters
- Various multi-core platforms
- RAM Disks, SSDs, and HDDs
Sample performance numbers for RDMA-Apache-Spark using benchmarks can be viewed by visiting the `Performance‘ tab of the above website.
All questions, feedback and bug reports are welcome. Please post to the rdma-spark-discuss mailing list (rdma-spark-discuss at cse.ohio-state.edu).