Recent Results Show HBM Can Make CPUs the Desired Platform for AI and HPC

Third-party performance benchmarks show CPUs with HBM2e memory now have sufficient memory bandwidth and computational capabilities to match GPU performance on many HPC and AI workloads. Recent Intel and third-party benchmarks now provide hard evidence that the upcoming Intel® Xeon® processors codenamed Sapphire Rapids with high bandwidth memory (fast, high bandwidth HBM2e memory) and Intel® Advanced Matrix Extensions can match the performance of GPUs for many AI and HPC workloads.

Intel Optane Persistent Memory Tops IO500 List

Intel announced today that its Optane persistent memory (PMem), in combination with Intel’s open-source distributed asynchronous object storage (DAOS) solution, has taken the no. 1 position on the Virtual Institute for I/O IO500 list for file system performance. Intel said DAOS, with 30 servers of Intel Optane PMem, came in ahead of systems on the […]

Deep Learning for Natural Language Processing – Choosing the Right GPU for the Job

In this new whitepaper from our friends over at Exxact Corporation we take a look at the important topic of deep learning for Natural Language Processing (NLP) and choosing the right GPU for the job. Focus is given to the latest developments in neural networks and deep learning systems, in particular a neural network architecture called transformers. Researchers have shown that transformer networks are particularly well suited for parallelization on GPU-based systems.

Are Memory Bottlenecks Limiting Your Application’s Performance?

Often, it’s not enough to parallelize and vectorize an application to get the best performance. You also need to take a deep dive into how the application is accessing memory to find and eliminate bottlenecks in the code that could ultimately be limiting performance. Intel Advisor, a component of both Intel Parallel Studio XE and Intel System Studio, can help you identify and diagnose memory performance issues, and suggest strategies to improve the efficiency of your code.

NVMe SSDs and DRAM for High Performing Mixed Workloads

This guest article from Kingston Technology covers how NVMe SSDs and DRAM are being used for high performing mixed workloads. 

Memory Modes For Increased Performance on Intel Xeon Phi

The Intel Xeon Phi processor supports different types of memory, and can organize this into three types of memory mode. The new processor from Intel contains two type of memory, MCDRAM and DDR memory. These different memory subsystems are complimentary but can be used in different ways, depending on the application that is being executed. “By using these two types of memory in the same system gives flexibility to the overall system and will show an increase in performance for almost any application.”

Intel Xeon Phi Coprocessor Architecture

“High performance systems now typically a host processor and a coprocessor. The role of the coprocessor is to provide the developer and the user the ability to significantly speed up simulations if the algorithm that is used can run with a high degree of parallelization and can take advantage of an SIMD architecture. The Intel Xeon Phi coprocessor is an example of a coprocessor that is used in many HPC systems today.”

Offloading Application Segments to Intel Xeon Phi Coprocessors

Offloading to a coprocessor does need to be considered carefully, due to the memory transfer requirements. When the data that is to be worked on resides in the memory of the main system, that data must be transferred to the coprocessor’s memory. The challenge arises because memory is not physically shared between the main system and the coprocessor.
“There are two offload models that the developer must consider when programming an application. The first is the non-shared memory model, and the second is the virtual shared memory model. Both of these models can be used in the same application.”

Prefetching Data for Intel Xeon Phi

“Prefetching on a coprocessor such as the Intel Xeon Phi coprocessor can be more important than on a main CPU such as the Intel Xeon CPUs. Since the cores on the Intel Xeon Phi coprocessor are in-order, they cannot hide memory latency as compared to an out-of-order CPU. In addition, since a coprocessor does not have an L3 cache, L2 misses must then access the slower memory subsystem.”