Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Accelerating Apache Spark with RDMA

Yuval Degani from Mellanox presented this talk at the OpenFabrics Workshop. “In this talk, we present a Java-based, RDMA network layer for Apache Spark. The implementation optimized both the RPC and the Shuffle mechanisms for RDMA. Initial benchmarking shows up to 25% improvement for Spark Applications.”

Accelerating Hadoop, Spark, and Memcached with HPC Technologies

“This talk will present RDMA-based designs using OpenFabrics Verbs and heterogeneous storage architectures to accelerate multiple components of Hadoop (HDFS, MapReduce, RPC, and HBase), Spark and Memcached. An overview of the associated RDMA-enabled software libraries (being designed and publicly distributed as a part of the HiBD project for Apache Hadoop.”

Introduction to Data Science with Spark

The Data Science with Spark Workshop addresses high-level parallelization for data analytics workloads using the Apache Spark framework. Participants will learn how to prototype with Spark and how to exploit large HPC machines like the Piz Daint CSCS flagship system.

Compressing Software Development Cycles with Supercomputer-based Spark

“Do you need to compress your software development cycles for services deployed at scale and accelerate your data-driven insights? Are you delivering solutions that automate decision making & model complexity using analytics and machine learning on Spark? Find out how a pre-integrated analytics platform that’s tuned for memory-intensive workloads and powered by the industry leading interconnect will empower your data science and software development teams to deliver amazing results for your business. Learn how Cray’s supercomputing approach in an enterprise package can help you excel at scale.”

Re-Architecting Spark For Performance Understandability

“This talk will describe Monotasks, a new architecture for the core of Spark that makes performance easier to reason about. In Spark today, pervasive parallelism and pipelining make it difficult to answer even simple performance questions like “what is the bottleneck for this workload?” As a result, it’s difficult for developers to know what to optimize, and it’s even more difficult for users to understand what hardware to use and what configuration parameters to set to get the best performance.”

Programming for High Performance Processors

“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”

Building a Platform for Collaborative Scientific Research on AWS

“The pharmaceutical industry trend toward joint ventures and collaborations has created a need for new platforms in which to work together. We’ll dive into architectural decisions for building collaborative systems. Examples include how such a platform allowed Human Longevity, Inc. to accelerate software deployment to production in a fast-paced research environment, and how Celgene uses AWS for research collaboration with outside universities and foundations.”

Cray Urika-GX System to Tackle Big Data Analytics

“We took the Aries system interconnect from our supercomputers, the industry-standard architecture of our clusters, the scalable graph engine from the Urika-GD appliance, and the pre-integrated, open infrastructure of our Urika-XA system and combined them into one agile analytics platform. The Urika-GX gives our customers the tool they need to overcome their most advanced analytics challenges today, and the platform to bridge to tomorrow.”

Learn Apache Hadoop with Spark in One Day

Hadoop and Spark clusters have a reputation for being extremely difficult to configure, install, and tune, but help is on the way. The good folks at Cluster Monkey are hosting a crash course entitled Apache Hadoop with Spark in One Day. “After completing the workshop attendees will be able to use and navigate a production Hadoop cluster and develop their own projects by building on the workshop examples.”

Changes Afoot from the HPC Crystal Ball

In this special guest feature from Scientific Computing World, Andrew Jones from NAG looks ahead at what 2016 has in store for HPC and finds people, not technology, to be the most important issue. “A disconcertingly large proportion of the software used in computational science and engineering today was written for friendlier and less complex technology. An explosion of attention is needed to drag software into a state where it can effectively deliver science using future HPC platforms.”