Pairwise DNA Optimization using Intel Xeon Phi


The Smith-Waterman algorithm is widely used for pairwise DNA sequence alignment. The computation, consisting of looking for pattern in very long strings of the DNA alphabet, is very demanding. Using the Intel Xeon Phi, tremendous performance gains can be obtained, as long as the algorithms have been modified to take advantage of parallelism.

PGI Accelerator Compilers Add OpenACC Support for x86


“Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base,” said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. “This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach.”

WRF Microphysics Optimization


“Microphysics provides atmospheric heat and moisture tendencies. This module has been optimized to take advantage of the Intel Xeon Phi coprocessor. However, some manual optimization can lead to even greater performance gains. By using manual optimizations, the overall speedup on a host CPU (Intel Xeon E5-2670) was 2.8 X, while the performance of running on the Intel Xeon Phi coprocessor was 3.5 X.”

Bill Gropp Presents: MPI and Hybrid Programming Models

Bill Gropp

Bill Gropp presented this talk at the Argonne Training Program on Extreme-Scale Computing. “Where it is used as an alternative to MPI, OpenMP often has difficulty achieving the performance of MPI (MPI’s much-criticized requirement that the user directly manage data motion ensures that the programmer does in fact manage that memory motion, leading to improved performance). This suggests that other programming models can be productively combined with MPI as long as they complement, rather than replace, MPI.”

Heterogeneous MPI Application Optimization


“Two components of ITAC, the Intel Trace Collector and the Intel Trace Analyzer can be used to understand the performance and bottlenecks of a Monte Carlo simulation. When each of the strike prices are distributed to both the Intel Xeon cores the Intel Xeon Phi coprocessor, the efficiency was about 79%, as the coprocessors can calculate the results much faster than the main CPU cores.”

New Intel® Omni-Path White Paper Details Technology Improvements

Rob Farber

The Intel Omni-Path Architecture (Intel® OPA) whitepaper goes through the multitude of improvements that Intel OPA technology provides to the HPC community. In particular, HPC readers will appreciate how collective operations can be optimized based on message size, collective communicator size and topology using the point-to-point send and receive primitives.

Titan Supercomputer Powers the Future of Forecasting


Knowing how the weather will behave in the near future is indispensable for countless human endeavors. Now, researchers at ECMWF are leveraging the computational power of the Titan supercomputer at Oak Ridge to improve weather forecasting.

Altair Launches PBS Pro 13

Altair Logo Stacked

Today Altair announced the general availability of PBS Professional 13.0, the latest version of the market-leading software product for high-performance computing workload management and job scheduling on clusters and supercomputers.

Computing With MPI in Heterogeneous Environments


Designating the appropriate provider for large MPI applications is critical to taking advantage of all of the compute power available. “A modern HPC system with multiple host cpus and multiple coprocessors such as the Intel Xeon Phi coprocessor housed in numerous racks can be optimized for maximum application performance with intelligent thread placement.”

Concurrent Kernel Offloading


“The combination of using a host cpu such as an Intel Xeon combined with a dedicated coprocessor such as the Intel Xeon Phi coprocessor has been shown in many cases to improve the performance of an application by significant amounts. When the datasets are large enough, it makes sense to offload as much of the workload as possible. But is this the case when the potential offload data sets are not as large?”