Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Are Memory Bottlenecks Limiting Your Application’s Performance?

Often, it’s not enough to parallelize and vectorize an application to get the best performance. You also need to take a deep dive into how the application is accessing memory to find and eliminate bottlenecks in the code that could ultimately be limiting performance. Intel Advisor, a component of both Intel Parallel Studio XE and Intel System Studio, can help you identify and diagnose memory performance issues, and suggest strategies to improve the efficiency of your code.

NVMe SSDs and DRAM for High Performing Mixed Workloads

This guest article from Kingston Technology covers how NVMe SSDs and DRAM are being used for high performing mixed workloads. 

Memory Modes For Increased Performance on Intel Xeon Phi

The Intel Xeon Phi processor supports different types of memory, and can organize this into three types of memory mode. The new processor from Intel contains two type of memory, MCDRAM and DDR memory. These different memory subsystems are complimentary but can be used in different ways, depending on the application that is being executed. “By using these two types of memory in the same system gives flexibility to the overall system and will show an increase in performance for almost any application.”

Intel Xeon Phi Coprocessor Architecture

“High performance systems now typically a host processor and a coprocessor. The role of the coprocessor is to provide the developer and the user the ability to significantly speed up simulations if the algorithm that is used can run with a high degree of parallelization and can take advantage of an SIMD architecture. The Intel Xeon Phi coprocessor is an example of a coprocessor that is used in many HPC systems today.”

Offloading Application Segments to Intel Xeon Phi Coprocessors

Offloading to a coprocessor does need to be considered carefully, due to the memory transfer requirements. When the data that is to be worked on resides in the memory of the main system, that data must be transferred to the coprocessor’s memory. The challenge arises because memory is not physically shared between the main system and the coprocessor.
“There are two offload models that the developer must consider when programming an application. The first is the non-shared memory model, and the second is the virtual shared memory model. Both of these models can be used in the same application.”

Prefetching Data for Intel Xeon Phi

“Prefetching on a coprocessor such as the Intel Xeon Phi coprocessor can be more important than on a main CPU such as the Intel Xeon CPUs. Since the cores on the Intel Xeon Phi coprocessor are in-order, they cannot hide memory latency as compared to an out-of-order CPU. In addition, since a coprocessor does not have an L3 cache, L2 misses must then access the slower memory subsystem.”