Since the beginnings of the Supercomputer era, users have been faced with the daunting task of rewriting their applications to take advantage of new next-generation machines. Performance just wasn’t portable. That may be changing.
Today Nvidia announced availability of version 15.10 of the PGI Accelerator Fortran, C and C++ compilers, adding support for the OpenACC directives-based parallel programming standard on x86 architecture multicore microprocessors.
The new PGI compilers deliver performance portability, allowing OpenACC-enabled source code to be compiled for parallel execution on a multicore CPU or a GPU accelerator. This capability provides tremendous flexibility for programmers, enabling them to develop applications that can take advantage of multiple system architectures with a single version of their source code.
“Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base,” said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. “This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach.”
This new PGI feature compiles OpenACC compute regions for parallel execution across all of the cores in an x86 processor or multi-socket server. The cores are treated in aggregate as a shared-memory accelerator, eliminating all data movement overhead in the resulting OpenACC programs. By default the compiler generates code that uses all the available cores in the system, and several methods exist for programmers to control and fine-tune this behavior.
We were extremely impressed that we can run OpenACC on a CPU with no code change and get equivalent performance to our OpenMP/MPI implementation, and get 4x faster performance when running on a GPU,” said Wayne Gaudin of the U.K.’s Atomic Weapons Establishment. “From the perspective of performance portability and code future proofing, this is an excellent result.”
As the chart above shows, performance on multicore CPUs for HPC apps using MPI + OpenACC is equivalent to MPI + OpenMP code. Compiling and running the same code on a Tesla K80 GPU can provide large speedups.
Key benefits of running OpenACC on multicore CPUs include:
- Effective utilization of all cores of a multicore CPU or multi-socket server for parallel execution
- Common programming model across CPUs and GPUs in Fortran, C and C++
- Rapid exploitation of existing multicore parallelism in a program using the KERNELS directive, which enables incremental optimization for parallel execution
- Scalable performance across multicore CPUs and GPUs
Porting HPC applications from one platform to another is one of the most significant costs in the adoption of breakthrough hardware technologies,” said Buddy Bland, project director at Oak Ridge National Laboratory. “OpenACC for multicore x86 CPUs provides continuity and code portability from existing CPU-only and GPU-enabled applications from machines like Titan to all of DOE’s upcoming major systems as well as portability among those systems.”
Growing Momentum for OpenACC
There are more than 10,000 developers using OpenACC today, and several recent developments underscore the continually growing adoption of OpenACC in high performance computing. At recent hackathons conducted worldwide, experts across a variety of scientific domains have been accelerating their scientific applications with accelerators and OpenACC. These include applications in such diverse fields as MRI image reconstruction (PowerGrid), computational fluid dynamics (INCOMP3D, HiPSTAR and Numeca), cosmology and astrophysics (RAMSES, CASTRO and MAESTRO), quantum chemistry (LSDALTON), computational physics (NekCEM) and more.
In addition, Gaussian, Inc. has announced that it is using OpenACC to port the GAUSSIAN computational chemistry application to accelerators. At the recent iCAS2 conference on climate and weather in Annecy, France, Meteosuisse, the Swiss Federal Office of Meteorology and Climatology, announced the deployment of a GPU-accelerated version of COSMO, the world’s first production weather forecasting application running on GPU accelerators.
In a recent poll of 150 OpenACC developers, 94 percent of the respondents reported getting a speedup when running on an accelerator, and over 90 percent of the users would recommend OpenACC.
In addition to support for OpenACC on multicore CPUs, PGI version 15.10 includes a pre-production release of the PGI Fortran, C and C++ compilers for OpenPOWER CPUs with support for OpenACC on NVIDIA GPUs.
PGI 15.10 with support for OpenACC on multicore CPUs is expected to be available this month directly from PGI and authorized resellers. New users can register for a free 90-day trial as part of the NVIDIA OpenACC Toolkit. University students and faculty can apply for a free PGI license.