Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Podcast: HPC & AI Convergence Enables AI Workload Innovation

In this Conversations in the Cloud podcast, Esther Baldwin from Intel describes how the convergence of HPC and AI is driving innovation. “On the topic of HPC & AI converged clusters, there’s a perception that if you want to do AI, you must stand up a separate cluster, which Esther notes is not true. Existing HPC customers can do AI on their existing infrastructure with solutions like HPC & AI converged clusters.”

Accelerate Your Apache Spark with Intel Optane DC Persistent Memory

Piotr Balcer and Cheng Xu from Intel gave this talk at the 2019 Spark+AI Summit. “Intel Optane DC persistent memory breaks the traditional memory/storage hierarchy and scales up the computing server with higher capacity persistent memory. Also it brings higher bandwidth & lower latency than storage like SSD or HDD. And Apache Spark is widely used in the analytics like SQL and Machine Learning on the cloud environment.”

NEC Embraces Open Source Frameworks for SX-Aurora Vector Computing

In this video from ISC 2019, Dr. Erich Focht from NEC Deutschland GmbH describes how the company is embracing open source frameworks for the SX-Aurora TSUBASA Vector Supercomputer. “Until now, with the existing server processing capabilities, developing complex models on graphical information for AI has consumed significant time and host processor cycles. NEC Laboratories has developed the open-source Frovedis framework over the last 10 years, initially for parallel processing in Supercomputers. Now, its efficiencies have been brought to the scalable SX-Aurora vector processor.”

Deep Learning Open Source Framework Optimized on Apache Spark*

Intel recently released BigDL. It’s an open source, highly optimized, distributed, deep learning framework for Apache Spark*. It makes Hadoop/Spark into a unified platform for data storage, data processing and mining, feature engineering, traditional machine learning, and deep learning workloads, resulting in better economy of scale, higher resource utilization, ease of use/development, and better TCO.

SpaRC: Scalable Sequence Clustering using Apache Spark

Zhong Wang from the Genome Institute at LBNL gave this talk at the Stanford HPC Conference. “Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. Here we describe an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC) that partitions reads based on their molecule of origin to enable downstream assembly optimization.”

Accelerating Apache Spark with RDMA

Yuval Degani from Mellanox presented this talk at the OpenFabrics Workshop. “In this talk, we present a Java-based, RDMA network layer for Apache Spark. The implementation optimized both the RPC and the Shuffle mechanisms for RDMA. Initial benchmarking shows up to 25% improvement for Spark Applications.”

Accelerating Hadoop, Spark, and Memcached with HPC Technologies

“This talk will present RDMA-based designs using OpenFabrics Verbs and heterogeneous storage architectures to accelerate multiple components of Hadoop (HDFS, MapReduce, RPC, and HBase), Spark and Memcached. An overview of the associated RDMA-enabled software libraries (being designed and publicly distributed as a part of the HiBD project for Apache Hadoop.”

Introduction to Data Science with Spark

The Data Science with Spark Workshop addresses high-level parallelization for data analytics workloads using the Apache Spark framework. Participants will learn how to prototype with Spark and how to exploit large HPC machines like the Piz Daint CSCS flagship system.

Compressing Software Development Cycles with Supercomputer-based Spark

“Do you need to compress your software development cycles for services deployed at scale and accelerate your data-driven insights? Are you delivering solutions that automate decision making & model complexity using analytics and machine learning on Spark? Find out how a pre-integrated analytics platform that’s tuned for memory-intensive workloads and powered by the industry leading interconnect will empower your data science and software development teams to deliver amazing results for your business. Learn how Cray’s supercomputing approach in an enterprise package can help you excel at scale.”

Re-Architecting Spark For Performance Understandability

“This talk will describe Monotasks, a new architecture for the core of Spark that makes performance easier to reason about. In Spark today, pervasive parallelism and pipelining make it difficult to answer even simple performance questions like “what is the bottleneck for this workload?” As a result, it’s difficult for developers to know what to optimize, and it’s even more difficult for users to understand what hardware to use and what configuration parameters to set to get the best performance.”