Intel® Cilk™ Plus is an extension to C and C++ that offers a quick and easy way to harness the power of both multicore and vector processing. The three Intel Cilk Plus keywords provide a simple yet surprisingly powerful model for parallel programming, while runtime and template libraries offer a well-tuned environment for building parallel applications.
“The Intel Xeon Phi coprocessor is an example of a many core system that can greatly increase the performance of an application when used correctly. Simply taking a serial application and expecting tremendous performance gains will not happen. Rewriting parts of the application will be necessary to take advantage of the architecture of the Intel Xeon Phi coprocessor.”
“Parallel software and parallel hardware, used together will give the best results for an application. If the application is serial in nature, and the processor is serial, then there will obviously not be a great gain in performance. When the application is parallelized, but the processor is serial, again, no great gain. A third combination is when the application is serial and the processing is parallel. Since the application cannot take advantage of the increased power of the hardware, there will not be a great performance boost. The best and really only solution is to modify the application to run in parallel, using high performing parallel hardware.”
The performance-savvy HPC developer is in high demand today. Leaps in intra-node parallelism, memory performance and capacity are set to meet applications struggling to exploit existing systems head-on.
One of the most used algorithms in numerical simulation is the solving of large, dense matrices. Thermal analysis, boundary element methods and electromagnetic wave calculations all depend on the ability to solve these large matrices as fast as possible. The ability to use a coprocessor such as the Intel Xeon Phi coprocessor will greatly speed up these calculations.
“OpenCL is a fairly new programming model that is designed to help programmers get the most out of a variety of processing elements in heterogeneous environments. Many benchmarks that are available have demonstrated that excellent performance can be obtained over a wide variety of devices. Rather than lock an application into one specific accelerator, by using OpenCL, applications can be run over on a number of different architectures with each showing excellent speedups over a native (host cpu) implementation.”
“Applications can be tuned to use both the Intel Xeon and the Intel Xeon Phi simultaneously, without modifying the code to just run on the coprocessor. Using a number of software tools from Intel, performance of a coupled cluster method can be demonstrated to gain a tremendous performance with excellent scaling.”
In many large threaded applications, synchronizing all of the threads by use of barriers can results in significant wasted processing time. If the application lends itself, loosely synchronous barriers instead of strictly synchronous barriers should be used and can recover lost time.
Solving Navier-Sokes equations are popular because they describe the physics of in a number of areas of interest to scientists and engineers. By solving these equations, the flow velocity can be calculated, and then other quantities of interest, such as pressure or temperature may be determined.
Simulation of physical processes such as the waves in an ocean or the wake behind a boat, although similar in a number of ways, require different approaches. With current systems designed with many parallel computational units, it is important to take advantage of the range of architectural features. Using HYDRO2D, the performance of the code can be examined and improved by taking advantage of a range of system features.