Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Parallel Applications Speed Up Manufacturing Product Development

The product design process has undergone a significant transformation with the availability of supercomputing power at traditional workstation prices. With over 100 threads available to an application in compact 2 socket servers, scalability of applications that are used as part of the product design and development process are just a keyboard away for a wide range of engineers.

Intel Parallel Studio XE 2018 For Demanding HPC Applications

“For those that develop HPC applications, there are usually two main areas that must be considered. The first is the translation of the algorithm, whether simulation based, physics based or pure research into the code that a modern computer system can run. A second challenge is how to move from the implementation of an algorithm to the performance that takes advantage of modern CPUs and accelerators.”

The Internet of Things and Tuning

“Understanding how the pipeline slots are being utilized can greatly increase the performance of the application. If pipeline slots are blocked for some reason, performance will suffer. Likewise, getting an understanding of the various cache misses can lead to a better organization of the data. This can increase performance while reducing latencies of memory to CPU.”

Internode Programming With MPI and Intel Xeon Phi Processor

“While MPI was originally developed for general purpose CPUs and is widely used in the HPC space in this capacity, MPI applications can also be developed and then deployed with the Intel Xeon Phi Processor. With the understanding of the algorithms that are used for a specific application, tremendous performance can be achieved by using a combination of OpenMP and MPI.”

Moving Toward the Cloud & Seamless HPC

This is the fifth and final entry in an insideHPC series that explores the HPC transition to the cloud and how this move can help create seamless HPC. This series, compiled in a complete Guide, covers cloud computing for HPC, why the OS is important, OpenStack fundamentals and more.

Feed The Cores – Memory Bandwidth Usage

“Memory bandwidth to the CPUs has always been important. There were typically CPU cores that would wait for the data (if not in cache) from main memory. However, with the advanced capabilities of the Intel Xeon Phi processor, there are new concepts to understand and take advantage of.”

Speeding Up Big Data Analysis With Intel MKL and Intel DAAL

“New algorithms that can query massive amounts of data an draw conclusions have been developed, but these algorithms need to be optimized on the underlying hardware. This is where the expertise of vendors who develop the hardware can add tremendous value. Optimizing the underlying libraries that can execute with a high degree of parallelism will definitely lead to improved performance for the software and productivity gains for the organization.”

Cycles Per Instruction – Why it matters

To compare how one version of a part of the code is running to another version, since this is a ratio, it is important to keep one of the values constant in order to understand if the optimization is working. If more cpu cycles are being used, but more instructions are being executed, then the ratio could be the same, but this measure will not show any improvement. The goal is to lower the CPI in certain parts of the code as well as the overall application.

Performance Gains Using Libraries

In many cases, applications that perform various simulations use some of the same math functions that many other applications use. Rather than each developer recoding the same math functions over and over, libraries, developed by experts can significantly speed up execution of the overall application. Since there can be many optimizations that experts who understand many of the nuances of the hardware would understand, it is important that developers be familiar with various libraries that are made available for HPC types of applications.

Vectorization with AVX-512 Intrinsics

“With the Intel compilers, intrinsics are recognized and the instructions are generated in-line which is a tremendous advantage. Since the Intel Xeon Phi processor when using the AVX-512 intrinsics can perform a tremendous number of floating point operations per second, it is beneficial to use intrinsics for certain math computations. To use intrinsics, all that is needed is the proper header file and then to call the desired intrinsic function.”