Expected later in 2016, Intel will be releasing production versions of its Knights Landing (KNL) 72-core coprocessor. These next generation coprocessors are impacting the physical design of the supercomputers now coming down the pike in a number of ways. One of the most dramatic changes is the significant increase in cooling requirements – these are high wattage chips that run very hot and present some interesting engineering challenges for systems designers.
“Vector instruction sets have progressed over time, and it important to use the most appropriate vector instruction set when running on specific hardware. The OpenMP SIMD directive allows the developer to explicitly tell the compiler to vectorize a loop. In this case, human intervention will override the compilers sense of dependencies, but that is OK if the developer knows their application well.”
Cloud computing has become a strong alternative to in house data centers for a large percentage of all enterprise needs. Most enterprises are adopting some form of could computing, with some estimates that as high as 90 % are putting workloads into a public cloud infrastructure. The whitepaper, Empowering Cloud Utilization with Cloud Bursting is an excellent summary of various options for enterprises that are planning for using a public cloud infrastructure.
“An interesting aspect to prefetching is the distance ahead of the data that is being used to prefetch more data. This is a critical parameter for success and can be defined as how many iterations ahead to issue a prefetch instruction, and can be referred to as the distance. A compiler will automatically determine the distance to prefetch, and can be determined by looking at the compiler optimization reports.”
“Prefetching on a coprocessor such as the Intel Xeon Phi coprocessor can be more important than on a main CPU such as the Intel Xeon CPUs. Since the cores on the Intel Xeon Phi coprocessor are in-order, they cannot hide memory latency as compared to an out-of-order CPU. In addition, since a coprocessor does not have an L3 cache, L2 misses must then access the slower memory subsystem.”
“In a heterogeneous system that combines both the Intel Xeon CPU and the Intel Xeon Phi coprocessor, there are various options available to optimize applications. Whether one has an advantage over another is somewhat dependent on the application that is being run. Comparisons can be made comparing the two methods, as long as the algorithm lends itself to run and take advantage of either OpenMP or OpenCL.”
“Intel has incorporated Intel Solutions for Lustre Software as part of the Intel SSF because it provides the performance to move data and minimize storage bottlenecks. Lustre is also open source based, and already enjoys a wide foundation of deployments in research around the world, while gaining significant traction in enterprise HPC. Intel’s version of Lustre delivers a high-performance storage solution in the Intel SSF that next-generation HPC needs to move toward the era of Exascale.”
“The combination of using both MPI and OpenMP is a topic that has been explored by many developers in order to determine the most optimum solution. Whether to use OpenMP for outer loops and MPI within, or by creating separate MPI processes and using OpenMP within can lead to various levels of performance. In most cases of determining which method will yield the best results will involve a deep understanding of the application, and not just rearranging directives.”
As multi-socket, then multi-core systems have become the standard, the Message Passing Interface (MPI) has become one of the most popular programming models for applications that can run in parallel using many sockets and cores. Shared memory programming interfaces, such as OpenMP, have allowed developers to take advantage of systems that combine many individual servers and shared memory within the server itself. However, two different programming models have been used at the same time. The MPI 3.0 standard allows for a new MPI interprocess shared memory extension (MPI SHM).
In this week’s industry Perspective, Katie Garrison of One Stop Systems explains how GPUltima allows HPC professionals to create a highly dense compute platform that delivers a petaflop of performance at greatly reduced cost and space requirements.compute power needed to quickly process the amount of data generated in intensive applications.