In this video from the 2016 OpenFabrics Workshop, DK Panda from Ohio State University presents: Exploiting HPC Technologies to Accelerate Big Data Processing.
“Modern HPC clusters are having many advanced features, such as multi-/many-core architectures, high-performance RDMA-enabled interconnects, SSD-based storage devices, burst-buffers and parallel file systems. However, current generation Big Data processing middleware (such as Hadoop, Spark, and Memcached) have not fully exploited the benefits of the advanced features on modern HPC clusters. This talk will present RDMA-based designs using OpenFabrics Verbs and heterogeneous storage architectures to accelerate multiple components of Hadoop (HDFS, MapReduce, RPC, and HBase), Spark and Memcached. An overview of the associated RDMAenabled software libraries (being designed and publicly distributed as a part of the HiBD project, http://hibd.cse.ohio-state.edu) for Apache Hadoop (integrated and plug-ins for Apache and HDP distributions), Apache Spark and Memcached will be presented. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these Big Data processing middleware.”