The Morton order is a mapping of multidimensional data to one dimension that preserves locality of the data. This is also known as Z-order. “By using Morton ordering as an alternative to row-major or column-major data storage, significant speedups can be achieved on the Intel Xeon Phi coprocessor or Intel Xeon CPU when performing matrix multiplies or matrix transposes.”
“A parallel implementation of SpMV can be implemented, using OpenMP directives. However, by allocating memory for each core, data races can be eliminated and data locality can be exploited, leading to higher performance. Besides running on the main CPU, vectorization can be implemented on the Intel Xeon Phi coprocessor. By blocking the data in various chunks, various implementations on the Intel Xeon Phi coprocessor can be run and evaluated.”
One of the most used algorithms in numerical simulation is the solving of large, dense matrices. Thermal analysis, boundary element methods and electromagnetic wave calculations all depend on the ability to solve these large matrices as fast as possible. The ability to use a coprocessor such as the Intel Xeon Phi coprocessor will greatly speed up these calculations.
“Two components of ITAC, the Intel Trace Collector and the Intel Trace Analyzer can be used to understand the performance and bottlenecks of a Monte Carlo simulation. When each of the strike prices are distributed to both the Intel Xeon cores the Intel Xeon Phi coprocessor, the efficiency was about 79%, as the coprocessors can calculate the results much faster than the main CPU cores.”
Today SGI and IT4Innovations national supercomputing center in the Czech Republic announced the deployment of the Salomon supercomputer. With a peak performance of 2 Petaflops, the Salomon supercomputer is twenty times more powerful than its predecessor and is the most powerful supercomputer in Europe running on the Xeon Phi coprocessors.
Through profiling, developers and users can get ideas on where an application’s hotspots are, in order to optimize certain sections of the code. In addition to locating where time is spent within an application, profiling tools can locate where there is little or no parallelism and a number of other factors that may affect performance. Performance tuning can help tremendously in many cases.
European researchers are welcome to use the world’s fastest supercomputer, the Tianhe-2, to pursue their research in collaboration with Chinese scientists and HPC specialists. “Enough Ivy Bridge Xeon E5 2692 processors had already been delivered to allow the Tianhe-2 to be upgraded from its current 55 Petaflops peak performance to the 100 Petaflops mark.”
The Distributed European Computing Initiative (DECI) in Europe has issued its 13th Call for Proposals for HPC Compute Resources. “Administered by PRACE, DECI enables European researchers to obtain access to the most powerful national (Tier-1) computing resources in Europe regardless of their country of origin or employment and to enhance the impact of European science and technology at the highest level.”
Applications that use 3D Finite Difference (3DFD) calculations are numerically intensive and can be optimized quite heavily to take advantage of accelerators that are available in today’s systems. The performance of an implementation can and should be optimized using numerical stencils. Choices made when designing and implementing algorithms can affect the Arithmetic Intensity (AI), which is a measure of how efficient an implementation, by comparing the flops and memory access.