MailChimp Developer

Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


PreFetch for Intel Xeon Phi – Part 2

“An interesting aspect to prefetching is the distance ahead of the data that is being used to prefetch more data. This is a critical parameter for success and can be defined as how many iterations ahead to issue a prefetch instruction, and can be referred to as the distance. A compiler will automatically determine the distance to prefetch, and can be determined by looking at the compiler optimization reports.”

Prefetching Data for Intel Xeon Phi

“Prefetching on a coprocessor such as the Intel Xeon Phi coprocessor can be more important than on a main CPU such as the Intel Xeon CPUs. Since the cores on the Intel Xeon Phi coprocessor are in-order, they cannot hide memory latency as compared to an out-of-order CPU. In addition, since a coprocessor does not have an L3 cache, L2 misses must then access the slower memory subsystem.”

OpenMP and OpenCL on Intel Xeon Phi

“In a heterogeneous system that combines both the Intel Xeon CPU and the Intel Xeon Phi coprocessor, there are various options available to optimize applications. Whether one has an advantage over another is somewhat dependent on the application that is being run. Comparisons can be made comparing the two methods, as long as the algorithm lends itself to run and take advantage of either OpenMP or OpenCL.”

High-Performance Lustre* Storage Solution Helps Enable the Intel® Scalable System Framework

“Intel has incorporated Intel Solutions for Lustre Software as part of the Intel SSF because it provides the performance to move data and minimize storage bottlenecks. Lustre is also open source based, and already enjoys a wide foundation of deployments in research around the world, while gaining significant traction in enterprise HPC. Intel’s version of Lustre delivers a high-performance storage solution in the Intel SSF that next-generation HPC needs to move toward the era of Exascale.”

MultiLevel Parallelism with Intel Xeon Phi

“The combination of using both MPI and OpenMP is a topic that has been explored by many developers in order to determine the most optimum solution. Whether to use OpenMP for outer loops and MPI within, or by creating separate MPI processes and using OpenMP within can lead to various levels of performance. In most cases of determining which method will yield the best results will involve a deep understanding of the application, and not just rearranging directives.”

Shared Memory and MPI 3.0

As multi-socket, then multi-core systems have become the standard, the Message Passing Interface (MPI) has become one of the most popular programming models for applications that can run in parallel using many sockets and cores. Shared memory programming interfaces, such as OpenMP, have allowed developers to take advantage of systems that combine many individual servers and shared memory within the server itself. However, two different programming models have been used at the same time. The MPI 3.0 standard allows for a new MPI interprocess shared memory extension (MPI SHM).

The GPUltima: Up to a Petaflop of Networked GPUs in a Single Rack

In this week’s industry Perspective, Katie Garrison of One Stop Systems explains how GPUltima allows HPC professionals to create a highly dense compute platform that delivers a petaflop of performance at greatly reduced cost and space requirements.compute power needed to quickly process the amount of data generated in intensive applications.

Heterogeneous Streams with Intel Xeon Phi

Matrix multiplies can be decomposed into tiles and executed very fast on the latest generations of coprocessors. Intel has developed the hStreams library that supports task concurrency on heterogeneous platforms. The concurrency may be across nodes (Xeon, KNC, KNL-SB, KNL-LB); within a node for small matrix operations; and in the overlapping of computation and communication, particularly for tiled solutions. It relieves the user of complexity in dealing with thread affinitization, offloading, memory types, and memory affinitization.

Enhanced Air Cooling with Internal Loop

Although liquid cooling is considered by many to be the future for data centers, the fact remains that there are some who do not yet need to make a full transformation to liquid cooling. Others are restricted until the next budget cycle. Whatever the reason, new technologies like Internal Loop are more affordable than liquid cooling and can replaces less efficient air coolers. This enables HPC data centers to still utilize the highest performing CPUs and GPUs.

Tracing Radio Frequencies with Intel Xeon Phi

An interesting use of HPC technologies is in the area of understanding the propagation of radio frequency energy in an outdoor environment. “Applications of this type need to be completed in seconds to minutes to be useful. Since the tracing of each ray is independent of another ray, this type of application can be distributed easily among the many cores of the Intel Xeon Phi coprocessor.”