Intel DAAL is a high-performance library specifically optimized for big data analysis on the latest Intel platforms, including Intel Xeon®, and Intel Xeon Phi™. It provides the algorithmic building blocks for all stages in data analysis in offline, batch, streaming, and distributed processing environments. It was designed for efficient use over all the popular data platforms and APIs in use today, including MPI, Hadoop, Spark, R, MATLAB, Python, C++, and Java.
“Increased system size and a greater reliance on utilizing system parallelism to achieve computational needs, requires innovative system architectures to meet the simulation challenges. As a step towards a new network class of co-processors intelligent network devices, which manipulate data traversing the data-center network, SHARP technology designed to offload collective operation processing to the network. This tutorial will provide an overview of SHARP technology, integration with MPI, SHARP software components and live example of running MPI collectives.”
DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference. “This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented.”
The move away from the traditional single processor/memory design has fostered new programming paradigms that address multiple processors (cores). Existing single core applications need to be modified to use extra processors (and accelerators). Unfortunately there is no single portable and efficient programming solution that addresses both scale-up and scale-out systems.
“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”
As an HPC technology vendor, Mellanox is in the business of providing the leading-edge interconnects that drive many of the world’s fastest supercomputers. To learn more about what’s new for SC16, we caught up with Michael Kagan, CTO of Mellanox. “Moving InfiniBand beyond EDR to HDR is critical not only for HPC, but also for the numerous industries that are adopting AI and Big Data to make real business sense out the amount of data available and that we continue to collect on a daily basis.”
With the advent of the tremendous compute density of new processors, it is important to understand if an application can take advantage of multicore. “Developers should understand if an application might be ready to run in a highly vectorized or many core environment before attempting to do the work necessary to obtain the high performance that might be expected.”
“PSyclone was developed for the UK Met Office and is now a part of the build system for Dynamo, the dynamical core currently in development for the Met Office’s ‘next generation’ weather and climate model software. By generating the complex code needed to make use of thousands of processors, PSyclone leaves the Met Office scientists free to concentrate on the science aspects of the model. This means that they will not have to change their code from something that works on a single processing unit (or core) to something that runs on many thousands of cores.”
In this video from the 2016 Blue Waters Symposium, Andriy Kot from NCSA presents: Parallel I/O Best Practices.
“I am honored to have been asked to drive NCSA’s continuing mission as a world-class, integrative center for transdisciplinary convergent research, education, and innovation,” said Gropp. “Embracing advanced computing and domain collaborations across the University of Illinois at Urbana-Champaign campus and ensuring scientific communities have access to advanced digital resources will be at the heart of these efforts.”