OpenMP.org has posted the slides and audio from a day-long tutorial on Hybrid MPI and OpenMP Parallel Programming from SC13.
“The simulations were performed on the Titan system at the Oak Ridge National Laboratory, and exhibits excellent scalability up to 18,000 K20X NVIDIA GPUs, reaching 20 Petaflops of aggregate sustained performance with a peak performance of 27.5 Petaflops for the most intensive computing component.”
“NumbaPro is a powerful compiler that takes high-level Python code directly to the GPU producing fast-code that is the equivalent of programming in a lower-level language. It contains an implementation of CUDA Python as well as higher-level constructs that make it easy to map array-oriented code to the parallel architecture of the GPU.”
“Dirk Pleiter from the Jülich Supercomputing Centre presents this talk from SC13. “In 2012, the NVIDIA Application Lab at Jülich was established to work with application developers on GPU enablement. In this talk we will tour through a variety of applications and evaluate opportunities of new GPU architectures and GPU-accelerated HPC systems, in particular for data-intensive applications.”
“This talk will review the deployment of petascale capabilities at ORNL that has led to the current architectural direction and will discuss the preparations aimed at ensuring a successful transition to heterogeneous architectures for some key simulation problems, including global atmospheric modeling.”
“It would be hard to put a flying snake in a wind tunnel. So we are trying to put them in GPUs instead—via computational fluid dynamics. Our initial success is to see that a flying snake’s cross-section can in fact create quite some lift: it even has a favorite angle of attack for which it gives extra lift. We don’t know if this is the secret of flying snakes, but we do know that looking at nature can teach engineers some new tricks.”
In this video from the Nvidia booth at SC13, Michael Wolfe presents on OpenACC. “The OpenACC API provides a high-level, performance portable programming mechanism for parallel programming accelerated nodes. Learn about the latest additions to the OpenACC specification, and see the PGI Accelerator compilers in action targeting the fastest NVIDIA GPUs.”
“TSUBAME 2.5 succeeded TSUBAME 2.0 by upgrading all 4224 Tesla M2050 GPUs to Kepler K20x GPUs, achieving 5.76 / 17.1 Petaflops peak in double / single point precision respectively, latter the fastest in Japan. By overcoming several technical challenges, TSUBAME 2.5 exhibits x2-3 speedup and multi-petaflops performance for many applications, leading to TSUBAME 3.0 in 2015-16.”
“The new system will enable researchers to solve ever more complex problems, be it in the search for new materials, in the prediction of climate changes, or in other disciplines. With the planned GPU acceleration, the application performance and the energy efficiency of our simulations will improve significantly. We are very excited about the collaborative development with Cray and NVIDIA of a truly general purpose hybrid multi-core system.”
In this video from SC13, Ruud van der Pas from Oracle presents an overview of tasking in OpenMP 4.0.