“Two components of ITAC, the Intel Trace Collector and the Intel Trace Analyzer can be used to understand the performance and bottlenecks of a Monte Carlo simulation. When each of the strike prices are distributed to both the Intel Xeon cores the Intel Xeon Phi coprocessor, the efficiency was about 79%, as the coprocessors can calculate the results much faster than the main CPU cores.”
The Intel Omni-Path Architecture (Intel® OPA) whitepaper goes through the multitude of improvements that Intel OPA technology provides to the HPC community. In particular, HPC readers will appreciate how collective operations can be optimized based on message size, collective communicator size and topology using the point-to-point send and receive primitives.
Designating the appropriate provider for large MPI applications is critical to taking advantage of all of the compute power available. “A modern HPC system with multiple host cpus and multiple coprocessors such as the Intel Xeon Phi coprocessor housed in numerous racks can be optimized for maximum application performance with intelligent thread placement.”
“The combination of using a host cpu such as an Intel Xeon combined with a dedicated coprocessor such as the Intel Xeon Phi coprocessor has been shown in many cases to improve the performance of an application by significant amounts. When the datasets are large enough, it makes sense to offload as much of the workload as possible. But is this the case when the potential offload data sets are not as large?”
“With the advent of massively parallel computing coprocessors, numerical optimization for deep-learning disciplines is now possible. Complex real-time pattern recognition, for example, that can be used for self driving cars and augmented reality can be developed and high performance achieved with the use of specialized, highly tuned libraries. By just using the Message Passing Interface (MPI) API, very high performance can be attained on hundreds to thousands of Intel Xeon Phi processors.”