Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Transaction Processing Performance Council Launches TPCx-HS Big Data Benchmark

Today the Transaction Processing Performance Council (TPC) announced the immediate availability of TPCx-HS Version 2, extending the original benchmark’s scope to include the Spark execution framework and cloud services. “Enterprise investment in Big Data analytics tools is growing exponentially, to keep pace with the rapid expansion of datasets,” said Tariq Magdon-Ismail, chairman of the TPCx-HS committee and staff engineer at VMware. “This is leading to an explosion in new hardware and software solutions for collecting and analyzing data. So there is enormous demand for robust, industry standard benchmarks to enable direct comparison of disparate Big Data systems across both hardware and software stacks, either on-premise or in the cloud. TPCx-HS Version 2 significantly enhances the original benchmark’s scope, and based on industry feedback, we expect immediate widespread interest.”

Lorena Barba Presents: Data Science for All

“In this new world, every citizen needs data science literacy. UC Berkeley is leading the way on broad curricular immersion with data science, and other universities will soon follow suit. The definitive data science curriculum has not been written, but the guiding principles are computational thinking, statistical inference, and making decisions based on data. “Bootcamp” courses don’t take this approach, focusing mostly on technical skills (programming, visualization, using packages). At many computer science departments, on the other hand, machine-learning courses with multiple pre-requisites are only accessible to majors. The key of Berkeley’s model is that it truly aims to be “Data Science for All.”

Dr. Eng Lim Goh presents: HPC & AI Technology Trends

Dr. Eng Lim Goh from Hewlett Packard Enterprise gave this talk at the HPC User Forum. “SGI’s highly complementary portfolio, including its in-memory high-performance data analytics technology and leading high-performance computing solutions will extend and strengthen HPE’s current leadership position in the growing mission critical and high-performance computing segments of the server market.”

Accelerating Apache Spark with RDMA

Yuval Degani from Mellanox presented this talk at the OpenFabrics Workshop. “In this talk, we present a Java-based, RDMA network layer for Apache Spark. The implementation optimized both the RPC and the Shuffle mechanisms for RDMA. Initial benchmarking shows up to 25% improvement for Spark Applications.”

Accelerating Hadoop, Spark, and Memcached with HPC Technologies

“This talk will present RDMA-based designs using OpenFabrics Verbs and heterogeneous storage architectures to accelerate multiple components of Hadoop (HDFS, MapReduce, RPC, and HBase), Spark and Memcached. An overview of the associated RDMA-enabled software libraries (being designed and publicly distributed as a part of the HiBD project for Apache Hadoop.”

insideBIGDATA Guide to Data Analytics in Government

This technology guide will help government thought leaders in how best to use data analytics to manage and derive value from an increased dependence on data. “This guide provides an in-depth overview of the use of data analytics technology in the public sector. Focus is given to how data analytics is being used in the government setting with a number of high-profile use case examples, how the Internet-of-Things is taking a firm hold in helping government agencies collect and find insights in a broadening number of data sources, how government sponsored healthcare and life sciences are expanding, as well as how cyber security and data analytics are helping to secure government applications.”

Re-Architecting Spark For Performance Understandability

“This talk will describe Monotasks, a new architecture for the core of Spark that makes performance easier to reason about. In Spark today, pervasive parallelism and pipelining make it difficult to answer even simple performance questions like “what is the bottleneck for this workload?” As a result, it’s difficult for developers to know what to optimize, and it’s even more difficult for users to understand what hardware to use and what configuration parameters to set to get the best performance.”

Bringing HPC Algorithms to Big Data Platforms

Nikolay Malitsky from Brookhaven National Laboratory presented this talk at the Spark Summit East conference. “This talk will present a MPI-based extension of the Spark platform developed in the context of light source facilities. The background and rationale of this extension are described in the paper “Bringing the HPC reconstruction algorithms to Big Data platforms.” which highlighted a gap between two modern driving forces of the scientific discovery process: HPC and Big Data technologies. As a result, it proposed to extend the Spark platform with inter-worker communication for supporting scientific-oriented parallel applications.”

Jennifer Chayes from Microsoft Research to Keynote ISC 2017

Today ISC 2017 announced that data scientist, Prof. Dr. Jennifer Tour Chayes from Microsoft Research will give the opening keynote at the conference. “I’ll discuss in some detail two particular applications: the very efficient machine learning algorithms for doing collaborative filtering on massive sparse networks of users and products, like the Netflix network; and the inference algorithms on cancer genomic data to suggest possible drug targets for certain kinds of cancer,” explains Chayes.

Intel DAAL Accelerates Data Analytics and Machine Learning

Intel DAAL is a high-performance library specifically optimized for big data analysis on the latest Intel platforms, including Intel Xeon®, and Intel Xeon Phi™. It provides the algorithmic building blocks for all stages in data analysis in offline, batch, streaming, and distributed processing environments. It was designed for efficient use over all the popular data platforms and APIs in use today, including MPI, Hadoop, Spark, R, MATLAB, Python, C++, and Java.