OpenACC is a directive based programming model that gives C/C++ and Fortran programmers the ability to write parallel programs simply by augmenting their code with pragmas. Pragmas are advisory messages that expose optimization, parallelization, and accelerator offload opportunities to the compiler so it can generate efficient parallel code for a variety of different target architectures including AMD and NVIDIA GPUs plus ARM, x86, Intel Xeon Phi, and IBM POWER processors. “Portable Performance Parallelism” describes the ability OpenACC gives programmers to build – from a single OpenACC annotated source tree – parallel applications for all these architectures. It’s a tremendous time and money saver plus the best news is that it is free!
1. Free high-quality compilers
On November 14th, PGI announced the PGI Community Edition. This suite of OpenACC enabled Fortran, C and C++ compilers and tools for macOS and Linux, both x86 and OpenPOWER, includes a no cost one year license and is available to everyone in academia and industry. Just look for the PGI Community edition on the PGI website. The GCC gomp4 branch already has early OpenACC support, thanks to the efforts of Mentor Graphics and the GCC community, with more extensive support coming in GCC 7.
2. Compilers are available from multiple sources
In addition to the GNU and PGI compilers, several no charge research compilers also have support for OpenACC, including OpenARC, Omni Compiler Project, and OpenUH. Of course, commercial compilers are also available from PathScale, PGI, and Cray. The PGI Professional compilers offer commercial support and access to previous releases, and it is released more frequently than the free PGI Community Edition.
3. OpenACC offers performance portability
As Michael Wolfe (Technology Chair OpenACC and Technology Lead, PGI) wrote in the foreword for Parallel Programming with OpenACC, “It’s all about Performance.” Parallelism and performance (as well as portability) are critical to projects that using OpenACC including Gaussian, the world’s most used quantum chemistry program, and ANSYS Fluent, the popular commercial Computation Fluid Dynamics program.
Sunil Sathe (Lead Software Developer, ANSYS Fluent) says, “The ability to compile OpenACC based source code for parallel execution on a variety of CPU/GPU hardware platforms makes it a great choice for our future programming model.” In addition, five of the thirteen applications the Oak Ridge Leadership Facility has selected for their Center for Accelerated Applications Readiness (CAAR) program are using or being ported to OpenACC. CAAR is focused on redesigning, porting, and optimizing application codes for OLCF’s next generation Summit hybrid CPU–GPU architecture. Summit is part of the $325M CORAL collaboration plus an additional $100M in research and development funding.
Dr. Haohuan Fu (Deputy Director, NSCC Wuxi) explains why the SWACC OpenACC compiler was developed to build applications for Chinese TaihuLight system, the world’s fastest supercomputer according to the June 2016 Top 500 list: “The OpenACC paradigm was chosen for its better fit to our many-core processor, with a few extensions to better support the efficient utilization of the new hardware features”. These same advantages that benefit many-core architectures also apply to GPUs as well. Dr. Fu observes, “OpenACC has been used to parallelize and tune CAM-SE, which is 530,000 lines of code, as well as WRF and tens of other real-world applications for the #1 system on the Top 500.” OpenACC also enables Gordon Bell levels of performance as exemplified by the NSCC Wuxi Gordon Bell submission, “10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics”.
4. OpenACC is complimentary to OpenMP
Those who write parallel code using OpenMP pragmas will be happy to know that OpenACC pragmas can co-exist side-by-side in the same code. This means there is a low-barrier to entry as existing OpenMP programs can be incrementally adapted to use OpenACC accelerators such as GPUs. Further, projects do not need to commit to OpenACC until they see the performance benefits for themselves in their own codes.
OpenACC can also run on the host processor – just like OpenMP. The nice part is that two environmental variables ACC_DEVICE_TYPE and ACC_DEVICE_NUM are included in the OpenACC standard. Thus, changing from running on a GPU to entirely on the host when using a unified binary is as simple as typing setting ‘ACC_DEVICE_TYPE=host’ in your favorite shell. (Note: the OpenACC compiler must support generating a unified binary for this feature to work.)
5. It is easy to start on GPUs with OpenACC
Science, rather than programming, is the ultimate goal for many domain scientists. As a result, time spent on coding is not considering productive. OpenACC is designed with domain scientists in mind as it lowers the barriers to entry and allows more time for science rather than time spent in programming. Scientists report 2x to 10x performance increase with as little as a few weeks of effort when modifying applications to use OpenACC pragmas and running on GPUs.
It’s easy to get started as there are a plethora of online OpenACC training materials and courses available. Check out OpenACC.org for links to a variety of educational materials. NVIDIA has several recorded online OpenACC courses. XSEDE and PRACE also hold regular workshops, “GPU Programming Using OpenACC”.
Additionally OpenACC hackathons are a popular way to get a quick hands-on introduction that leverages the experience and expertise of many people by challenging groups of hackers to modify code with OpenACC to deliver the greatest performance and speedup. A number of hackathons are listed on the OpenACC.org site.
Supercomputing 2016 is a good source for in-person OpenACC information. Check the SC16 OpenACC schedule. Events include a BoF (Birds of a Feather), a workshop and a number of vendor demonstrations and exhibits. It’s also possible to get a free copy of the book “Parallel Programming with OpenACC” at the Monday, November 14, 2016 book signing in the OpenACC booth from 7pm – 9pm.
The SC16 OpenACC BoF, “OpenACC API User Experience, Vendor Reaction, Relevance, and Roadmap”, will be very interesting (Wednesday, November 16th from 5:15pm to 7:00pm). In particular, the BoF will provide valuable information about current compiler projects including:
• The NSCC Wuxi Presentation on Sunway Taihu Light and the SWACC compiler
• OpenACC for All (x86, OpenPower, ARM, AMD/NVIDIA GPUs, Intel Xeon Phi)
• OpenACC in GCC
• OpenACC 2.6 Proposed additions
Other OpenACC activities at SC16 include the Third International Workshop on Accelerator Programming Using Directives (WACCPD) on Monday, a presentation on OpenMP and OpenACC on Tuesday, and don’t miss the very likely to be lively “When will OpenMP and OpenACC Merge?” presentation as well as the “Bringing About HPC Open-Standards World Peace” panel.
A strong response thus far
The worldwide response thus far to OpenACC has been both strong and significant. For example, Jeffrey Vetter (HPC luminary and Joint Professor Georgia Institute of Technology) wrote: “OpenACC represents a major development for the scientific community. Programming models for open science by definition need to be flexible, open and portable across multiple platforms. OpenACC is well-designed to fill this need.”
The availability of free, high-quality OpenACC compilers along with the other four reasons discussed in this article will certainly make more people aware of OpenACC and the opportunities it provides for “Performant, Portable, Parallelism”.
Submitted by Rob Farber is a global technology consultant and author with an extensive background in HPC and in developing machine learning technology that he applies at national labs and commercial organizations. He was also the editor of Parallel Programming with OpenACC.