Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


OpenMP at 20 Moving Forward to 5.0

This year, OpenMP*, the widely used API for shared memory parallelism supported in many C/C++ and Fortran compilers, turns 20. OpenMP is a great example of how hardware and software vendors, researchers, and academia, volunteering to work together, can successfully design a specification that benefits the entire developer community.

Intel Parallel Studio XE 2018 For Demanding HPC Applications

“For those that develop HPC applications, there are usually two main areas that must be considered. The first is the translation of the algorithm, whether simulation based, physics based or pure research into the code that a modern computer system can run. A second challenge is how to move from the implementation of an algorithm to the performance that takes advantage of modern CPUs and accelerators.”

Panel Looks at ARM’s Advance towards HPC

In this video from the ARM Research Summit, industry thought leaders discuss the challenges, opportunities, and technical hurdles for the ARM architecture to thrive in the HPC market. “The panel will focus on Arm in HPC, Software & Hardware differentiation and direction for exascale and beyond, thoughts on accelerators, and more!”

New PGI 17.7 Release Supports NVIDIA Volta GPUs

Today NVIDIA released Version 17.7 of PGI 2017 Compilers and Tools, delivering improved performance and programming simplicity to HPC developers who target multicore CPUs and heterogeneous GPU-accelerated systems.

Intel Parallel Studio XE 2018 Released

Intel has announced the release of Intel® Parallel Studio XE 2018, with updated compilers and developer tools. It is now available for downloading on a 30-day trial basis. ” This week’s formal release of the fully supported product is notable with new features that further enhance the toolset for accelerating HPC applications.”

Challenges and Opportunities for HPC Interconnects and MPI

“This talk will reflect on prior analysis of the challenges facing high-performance interconnect technologies intended to support extreme-scale scientific computing systems, how some of these challenges have been addressed, and what new challenges lay ahead. Many of these challenges can be attributed to the complexity created by hardware diversity, which has a direct impact on interconnect technology, but new challenges are also arising indirectly as reactions to other aspects of high-performance computing, such as alternative parallel programming models and more complex system usage models.”

TensorFlow Deep Learning Optimized for Modern Intel Architectures

Researchers at Google and Intel recently collaborated to extract the maximum performance from Intel® Xeon and Intel® Xeon Phi processors running TensorFlow*, a leading deep learning and machine learning framework. This effort resulted in significant performance gains and leads the way for ensuring similar gains from the next generation of products from Intel. Optimizing Deep Neural Network (DNN) models such as TensorFlow presents challenges not unlike those encountered with more traditional High Performance Computing applications for science and industry.

More Than Ever, Vectorization and Multithreading are Essential for Performance

Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.

3X Performance Boost Using Intel Advisor and Intel Trace Analyzer in Astrophysics Simulations

On today’s processors, it is crucial to both vectorize (using AVX* or SIMD* instructions) and parallelize software to realize the full performance potential of the processor. By optimizing their MHD astrophysics applications with tools from Intel Parallel Studio XE, and running on the latest Intel hardware, the NSU team achieved a performance speed-up of 3X, cutting the standard time for calculating one problem from one week to just two days.

Speeding Up Big Data Analysis With Intel MKL and Intel DAAL

“New algorithms that can query massive amounts of data an draw conclusions have been developed, but these algorithms need to be optimized on the underlying hardware. This is where the expertise of vendors who develop the hardware can add tremendous value. Optimizing the underlying libraries that can execute with a high degree of parallelism will definitely lead to improved performance for the software and productivity gains for the organization.”