MailChimp Developer

Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


New Intel® Omni-Path White Paper Details Technology Improvements

Rob Farber

The Intel Omni-Path Architecture (Intel® OPA) whitepaper goes through the multitude of improvements that Intel OPA technology provides to the HPC community. In particular, HPC readers will appreciate how collective operations can be optimized based on message size, collective communicator size and topology using the point-to-point send and receive primitives.

Arithmetic Intensity of Stencil Operations

F1.large

Applications that use 3D Finite Difference (3DFD) calculations are numerically intensive and can be optimized quite heavily to take advantage of accelerators that are available in today’s systems. The performance of an implementation can and should be optimized using numerical stencils. Choices made when designing and implementing algorithms can affect the Arithmetic Intensity (AI), which is a measure of how efficient an implementation, by comparing the flops and memory access.

OpenCL for Performance

OpenCL

“OpenCL is a fairly new programming model that is designed to help programmers get the most out of a variety of processing elements in heterogeneous environments. Many benchmarks that are available have demonstrated that excellent performance can be obtained over a wide variety of devices. Rather than lock an application into one specific accelerator, by using OpenCL, applications can be run over on a number of different architectures with each showing excellent speedups over a native (host cpu) implementation.”

Research Demands More Compute Power and Faster Storage for Complex Computational Applications

1

Many Universities, private research labs and government research agencies have begun using High Performance Computing (HPC) servers, compute accelerators and flash storage arrays to accelerate a wide array of research among disciplines in math, science and engineering. These labs utilize GPUs for parallel processing and flash memory for storing large datasets. Many universities have HPC labs that are available for students and researchers to share resources in order to analyze and store vast amounts of data more quickly.

Ray Tracing with Embree Kernels using Intel Xeon Phi

Embree

The Embree kernel approach, using the Intel Xeon Phi coprocessor is applicable to many situations. The implementation can be tuned to the hardware available, using different vector widths and workloads per ray. With a flexible toolkit for rendering, applications can take advantage of the latest hardware acceleration to achieve maximum performance.

Lustre* at the Core of HPC and Big Data Convergence

HPC BIGDATA Convergence

Companies already using High-performance Computing (HPC) with a Lustre file system for simulations, such as those in the financial, oil and gas, and manufacturing sectors, want to convert some of their HPC cycles to Big Data analytics. This puts Lustre at the core of the convergence of Big Data and HPC.

Black-Scholes Pricing on Intel Xeon Phi

industry_finance-1024x768

“An expanding area of work both on the hardware front and the software side is to modify and optimize applications to run on both the host processor and a coprocessor. Many techniques to transform applications to reduce runtime have been discussed and implemented across a wide variety of applications.”

Nested Parallelism

phi-compressor

The benefits of nested parallelism on highly threaded applications can be determined and quantified. With the number of cores in both the host CPU (Intel Xeon) and the coprocessor (Intel Xeon Phi) continues to increase, much thought must be given to minimizing the thread overhead when many threads need to be synchronized, as well as the memory access for each processor (core). Tasks that can be spread across an entire system to exploit the algorithm’s parallelism, should be mapped to the NUMA node to make them more efficient.

Lustre* For the Enterprise

lustre

Lustre* is not just for the national labs any longer. It was born out of serving up data extremely fast to the world’s most powerful HPC clusters using parallel I/O to improve performance and scalability. Here are five reasons why Lustre is enterprise-ready.

Quantum Chemistry at Scale

quantum chemistry

“Applications can be tuned to use both the Intel Xeon and the Intel Xeon Phi simultaneously, without modifying the code to just run on the coprocessor. Using a number of software tools from Intel, performance of a coupled cluster method can be demonstrated to gain a tremendous performance with excellent scaling.”