Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Articles and news on parallel programming and code modernization

Advances in the Legion Programming Model

Wonchan Lee, Todd Warszawski, and Karthik Murthy gave this talk at the Stanford HPC Conference. “Legion is an exascale-ready parallel programming model that simplifies the mapping of a complex, large-scale simulation code on a modern heterogeneous supercomputer. Legion relieves scientists and engineers of several burdens: they no longer need to determine which tasks depend on other tasks, specify where calculations will occur, or manage the transmission of data to and from the processors. In this talk, we will focus on three aspects of the Legion programming system, namely, dynamic tracing, projection functions, and vectorization.”

FPGA Programming Made Easy

In the past, it was necessary to understand a complex programming language such as Verilog or VHDL, that was designed for a specific FPGA. “Using a familiar language such as OpenCL, developers can become more productive, sooner when deciding to use an FPGA for a specific purpose. OpenCL is portable and is designed to be used with almost any type of accelerator.”

Intel MKL Speeds Up Automated Driving Workloads on the Intel Xeon Processor

The automated driving developer community typically uses Eigen*, a C++ math library, for the matrix operations required by the Extended Kalman Filter algorithm. EKF usually involves many small matrices. However most HPC library routines for matrix operations are optimized for large matrices. “Intel MKL provides highly-tuned xGEMM function for matrix-matrix multiplication, with special paths for small matrices. Eigen can take advantage of Intel MKL through use of a compiler flag. A significant speedup results when using Eigen and Intel MKL and compiling the automated driving apps with the latest Intel C++ compiler.”

First Experiences with Parallel Application Development in Fortran 2018

Damian Rouson from the Sourcery Institute gave this talk at the Stanford HPC Conference. “This talk will present performance and scalability results of the mini-app running on several platforms using up to 98,000 cores. A second application involves the use of teams of images (processes) that execute indecently for ensembles of computational hydrology simulations using WRF-Hyrdro, the hydrological component of the Weather Research Forecasting model also developed at NCAR. Early experiences with portability and programmability of Fortran 2018 will also be discussed.”

Performance Insights Using the Intel Advisor Python API

Tuning a complex application for today’s heterogeneous platforms requires an understanding of the application itself as well as familiarity with tools that are available for assisting with analyzing where in the code itself to look for bottlenecks.  The process for optimizing the performance of an application, in general, requires the following steps that are most likely applicable for a wide range of applications.

Podcast: Open MPI for Exascale

In this Let’s Talk Exascale podcast, David Bernholdt from ORNL discusses the Open MPI for Exascale project, which is focusing on the communication infrastructure of MPI, or message-passing interface, an extremely widely used standard for interprocessor communications for parallel computing. “It’s possible that even though applications may make millions or billions of short calls to the MPI library during the course of an execution, performance improvements can have a significant overall impact on the application runtime.”

Intel MKL Compact Matrix Functions Attain Significant Speedups

The latest version of Intel® Math Kernel Library (MKL) offers vectorized compact functions for general and specialized matrix computations of this type. These functions rely on true SIMD (single instruction, multiple data) matrix computations, and provide significant performance benefits compared to traditional techniques that exploit multithreading but rely on standard data formats.

Flow Graph Analyzer – Speed Up Your Applications

Using the Intel® Advisor Flow Graph Analyzer (FGA), an application such as those that are needed for autonomous driving can be developed and implemented using very high performing underlying software and hardware. Under the Intel FGA, are the Intel Threaded Building Blocks which take advantage of the multiple cores that are available on all types of systems today.

Call for Proposals: International Conference on Parallel Processing in Oregon

The 47th International Conference on Parallel Processing has issued its Call for Proposals. Sponsored by ACM SIGHPC, the event takes place August 13-16 in Eugene, Oregon. “Parallel and distributed computing is a central topic in science, engineering and society. ICPP, the International Conference on Parallel Processing, provides a forum for engineers and scientists in academia, industry and government to present their latest research findings in all aspects of parallel and distributed computing.”

Vectorization Now More Important Than Ever

Vectorization, the hardware optimization technique synonymous with early vector supercomputers like the Cray-1 (1975), has reappeared with even greater importance than before. Today, 40+ years later, the AVX-512 vector instructions in the most recent many-core Intel Xeon and Intel® Xeon PhiTM processors can increase application performance by 16x for single-precision codes.