Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


TensorFlow Deep Learning Optimized for Modern Intel Architectures

Researchers at Google and Intel recently collaborated to extract the maximum performance from Intel® Xeon and Intel® Xeon Phi processors running TensorFlow*, a leading deep learning and machine learning framework. This effort resulted in significant performance gains and leads the way for ensuring similar gains from the next generation of products from Intel. Optimizing Deep Neural Network (DNN) models such as TensorFlow presents challenges not unlike those encountered with more traditional High Performance Computing applications for science and industry.

More Than Ever, Vectorization and Multithreading are Essential for Performance

Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.

3X Performance Boost Using Intel Advisor and Intel Trace Analyzer in Astrophysics Simulations

On today’s processors, it is crucial to both vectorize (using AVX* or SIMD* instructions) and parallelize software to realize the full performance potential of the processor. By optimizing their MHD astrophysics applications with tools from Intel Parallel Studio XE, and running on the latest Intel hardware, the NSU team achieved a performance speed-up of 3X, cutting the standard time for calculating one problem from one week to just two days.

Speeding Up Big Data Analysis With Intel MKL and Intel DAAL

“New algorithms that can query massive amounts of data an draw conclusions have been developed, but these algorithms need to be optimized on the underlying hardware. This is where the expertise of vendors who develop the hardware can add tremendous value. Optimizing the underlying libraries that can execute with a high degree of parallelism will definitely lead to improved performance for the software and productivity gains for the organization.”

OpenACC Brings Directives to Accelerated Computing at ISC 2017

In this video from ISC 2017, Sunita Chandrasekaran and Michael Wolfe describe how OpenACC makes GPU-accelerated computing more accessible to scientists and engineers. “OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model.”

Go with Intel® Data Analytics Acceleration Library and Go*

Use of the Go* programming language and it’s developer community has grown significantly since it’s official launch by Google in 2009. Like many popular programming languages (C and Java come to mind), Go started as an experiment to design a new programming language that would fix some of the common problems of other languages and yet stay true to the basic tenets of modern programming: be scalable, productive, readable, enable robust development environments, and support networking and multiprocessing.

Video: OpenACC Update from ISC 2017

In this video from ISC 2017, Sunita Chandrasekaran and Michael Wolfe present an overview of OpenACC and a preview of upcoming GPU Hackathon events. “OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model.”

The Virtual Institute – High Productivity Supercomputing Celebrates 10th Anniversary

“The mission of the Virtual Institute – High Productivity Supercomputing (VI-HPS) is to improve the quality and accelerate the development process of complex simulation codes in science and engineering that are being designed to run on highly-parallel computer systems. For this purpose, we are developing integrated state-of-the-art programming tools for high-performance computing that assist programmers in diagnosing programming errors and optimizing the performance of their applications.”

Performance Gains Using Libraries

In many cases, applications that perform various simulations use some of the same math functions that many other applications use. Rather than each developer recoding the same math functions over and over, libraries, developed by experts can significantly speed up execution of the overall application. Since there can be many optimizations that experts who understand many of the nuances of the hardware would understand, it is important that developers be familiar with various libraries that are made available for HPC types of applications.

OpenACC Takes Off at ISC17

Today the OpenACC standards group announced plans to showcase new advancements and increasing momentum for their programming model at ISC 2017 in Frankfurt. “OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level or explicit models.”