The performance-savvy HPC developer is in high demand today. Leaps in intra-node parallelism, memory performance and capacity are set to meet applications struggling to exploit existing systems head-on.
One of the most used algorithms in numerical simulation is the solving of large, dense matrices. Thermal analysis, boundary element methods and electromagnetic wave calculations all depend on the ability to solve these large matrices as fast as possible. The ability to use a coprocessor such as the Intel Xeon Phi coprocessor will greatly speed up these calculations.
“OpenCL is a fairly new programming model that is designed to help programmers get the most out of a variety of processing elements in heterogeneous environments. Many benchmarks that are available have demonstrated that excellent performance can be obtained over a wide variety of devices. Rather than lock an application into one specific accelerator, by using OpenCL, applications can be run over on a number of different architectures with each showing excellent speedups over a native (host cpu) implementation.”
“Applications can be tuned to use both the Intel Xeon and the Intel Xeon Phi simultaneously, without modifying the code to just run on the coprocessor. Using a number of software tools from Intel, performance of a coupled cluster method can be demonstrated to gain a tremendous performance with excellent scaling.”
In many large threaded applications, synchronizing all of the threads by use of barriers can results in significant wasted processing time. If the application lends itself, loosely synchronous barriers instead of strictly synchronous barriers should be used and can recover lost time.
Solving Navier-Sokes equations are popular because they describe the physics of in a number of areas of interest to scientists and engineers. By solving these equations, the flow velocity can be calculated, and then other quantities of interest, such as pressure or temperature may be determined.
Simulation of physical processes such as the waves in an ocean or the wake behind a boat, although similar in a number of ways, require different approaches. With current systems designed with many parallel computational units, it is important to take advantage of the range of architectural features. Using HYDRO2D, the performance of the code can be examined and improved by taking advantage of a range of system features.
For about 40 years, developers and users could count on an increase in CPU performance that would make applications run faster. However, with the slowdown in constant clock rate increases being replaced by additional core counts and even more new instructions, rethinking algorithms, their use of the latest APIs, and using the latest compilers has become critical for the next generation of application performance enhancements.
With 69 contributors from academia and industry, a new book shows how to leverage parallelism on processors and coprocessors with the same programming, providing detailed illustrations of effective ways to combine Intel Xeon Phi coprocessors with multicore processors.
In this video from the 2014 Argonne Training Program on Extreme-Scale Computing, James Reinders presents: Computer Architecture and Structured Parallel Programming. “At ATPESC 2014, we captured 67 hours of lectures in 86 videos of presentations by pioneers and elites in the HPC community on topics ranging from programming techniques and numerical algorithms best suited for leading-edge HPC systems to trends in HPC architectures and software most likely to provide performance portability through the next decade and beyond.”