The Smith-Waterman algorithm is widely used for pairwise DNA sequence alignment. The computation, consisting of looking for pattern in very long strings of the DNA alphabet, is very demanding. Using the Intel Xeon Phi, tremendous performance gains can be obtained, as long as the algorithms have been modified to take advantage of parallelism.
“Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base,” said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. “This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach.”
“Microphysics provides atmospheric heat and moisture tendencies. This module has been optimized to take advantage of the Intel Xeon Phi coprocessor. However, some manual optimization can lead to even greater performance gains. By using manual optimizations, the overall speedup on a host CPU (Intel Xeon E5-2670) was 2.8 X, while the performance of running on the Intel Xeon Phi coprocessor was 3.5 X.”
Bill Gropp presented this talk at the Argonne Training Program on Extreme-Scale Computing. “Where it is used as an alternative to MPI, OpenMP often has difficulty achieving the performance of MPI (MPI’s much-criticized requirement that the user directly manage data motion ensures that the programmer does in fact manage that memory motion, leading to improved performance). This suggests that other programming models can be productively combined with MPI as long as they complement, rather than replace, MPI.”
“Two components of ITAC, the Intel Trace Collector and the Intel Trace Analyzer can be used to understand the performance and bottlenecks of a Monte Carlo simulation. When each of the strike prices are distributed to both the Intel Xeon cores the Intel Xeon Phi coprocessor, the efficiency was about 79%, as the coprocessors can calculate the results much faster than the main CPU cores.”
The Intel Omni-Path Architecture (Intel® OPA) whitepaper goes through the multitude of improvements that Intel OPA technology provides to the HPC community. In particular, HPC readers will appreciate how collective operations can be optimized based on message size, collective communicator size and topology using the point-to-point send and receive primitives.
Designating the appropriate provider for large MPI applications is critical to taking advantage of all of the compute power available. “A modern HPC system with multiple host cpus and multiple coprocessors such as the Intel Xeon Phi coprocessor housed in numerous racks can be optimized for maximum application performance with intelligent thread placement.”
“The combination of using a host cpu such as an Intel Xeon combined with a dedicated coprocessor such as the Intel Xeon Phi coprocessor has been shown in many cases to improve the performance of an application by significant amounts. When the datasets are large enough, it makes sense to offload as much of the workload as possible. But is this the case when the potential offload data sets are not as large?”