Accelerating Hadoop, Spark, and Memcached with HPC Technologies

Print Friendly, PDF & Email

DK Panda, Ohio State University

In this video from the OpenFabrics Workshop, DK Panda from Ohio State University presents: HPC Meets Big Data: Accelerating Hadoop, Spark, and Memcached with HPC Technologies.

“Modern HPC clusters are having many advanced features, such as multi-/many-core architectures, highperformance RDMA-enabled interconnects, SSD-based storage devices, burst-buffers and parallel file systems. However, current generation Big Data processing middleware (such as Hadoop, Spark, and Memcached) have not fully exploited the benefits of the advanced features on modern HPC clusters. This talk will present RDMA-based designs using OpenFabrics Verbs and heterogeneous storage architectures to accelerate multiple components of Hadoop (HDFS, MapReduce, RPC, and HBase), Spark and Memcached. An overview of the associated RDMA-enabled software libraries (being designed and publicly distributed as a part of the HiBD project for Apache Hadoop (integrated and plug-ins for Apache, HDP, and Cloudera distributions), Apache Spark and Memcached will be presented. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these Big Data processing middleware.

Dr. DK Panda is a Professor and Distinguished Scholar of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance networking, InfiniBand, network-based computing, exascale computing, programming models, GPUs and accelerators, high performance file systems and storage, virtualization and cloud computing and BigData (Hadoop (HDFS, MapReduce and HBase) and Memcached). He has published over 400 papers in major journals and international conferences related to these research areas.

Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, Omni-Path, iWARP and RoCE. His research group is currently collaborating with National Laboratories and leading InfiniBand, Omni-Path, iWARP and RoCE companies on designing various subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI and MPI+PGAS over InfiniBand, iWARP and RoCE with support for GPGPUs, Xeon Phio and Virtualization) software libraries , developed by his research group, are currently being used by more than 2,700 organizations worldwide (in 83 countries). These software packages have enabled several InfiniBand clusters to get into the latest TOP500 ranking. More than 405,000 downloads of this software have taken place from the project website alone. These software packages are also available with the software stacks for network vendors (InfiniBand and iWARP), server vendors and Linux distributors (such as RedHat and SuSE).

See more talks in the OpenFabrics Workshop Video Gallery

Sign up for our insideHPC Newsletter