“GPUs potentially offer exceptionally high memory bandwidth and performance for a wide range of applications. The challenge in utilizing such accelerators has been the difficulty in programming them. Enter GPU Hackathons; Our mentors come from national laboratories, universities and vendors, and besides having extensive experience in programming GPUs, many of them develop the GPU-capable compilers and help define standards such as OpenACC and OpenMP.”
This year, the OpenMP Architecture Review Board is celebrating the 20th anniversary of the first OpenMP API specification with a pair of events: OpenMPCon and the International Conference on OpenMP (IWOMP). Both events will take place the week of September 18 at Stony Brook University in New York. “Developers attending this year’s OpenMPCon and IWOMP conferences will have the added bonus of joining us to celebrate the vital contribution OpenMP has made by enabling high-performance computing over the past two decades and will also help us to shape OpenMP’s next twenty years.” said Michael Klemm, OpenMP CEO.”
The move away from the traditional single processor/memory design has fostered new programming paradigms that address multiple processors (cores). Existing single core applications need to be modified to use extra processors (and accelerators). Unfortunately there is no single portable and efficient programming solution that addresses both scale-up and scale-out systems.
“OpenMP, Fortran 2008 and TBB are standards that can help to create parallel areas of an application. MKL could also be considered to be part of this family, because it uses OpenMP within the library. OpenMP is well known and has been used for quite some time and is continues to be enhanced. Some estimates are as high as 75 % of cycles used are for Fortran applications. Thus, in order to modernize some of the most significant number crunchers today, Fortran 2008 should be investigated. TBB is for C++ applications only, and does not require compiler modifications. An additional benefit to using OpenMP and Fortran 2008 is that these are standards, which allows code to be more portable.”
“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”
Today ORNL announced the full schedule of 2017 GPU Hackathons at multiple locations around the world. “The goal of each hackathon is for current or prospective user groups of large hybrid CPU-GPU systems to send teams of at least 3 developers along with either (1) a (potentially) scalable application that could benefit from GPU accelerators, or (2) an application running on accelerators that need optimization. There will be intensive mentoring during this 5-day hands-on workshop, with the goal that the teams leave with applications running on GPUs, or at least with a clear roadmap of how to get there.”
Manuel Arenaz from Appentra presented this talk at the OpenMP booth at SC16. “Parallware is a new technology for static analysis of programs based on the production-grade LLVM compiler infrastructure. Using a fast, extensible hierarchical classification scheme to address dependence analysis, it discovers parallelism and annotates the source code with the most appropriate OpenMP & OpenACC directives.”
Today Appentra Solutions announced that the company will participate in the Emerging Technologies Showcase at SC16. As an HPC startup, Appentra was selected for its Parallware technology, an LLVM-based software technology that assists in the parallelization of scientific codes with OpenMP and OpenACC. “The new Parallware Trainer is a great tool for providing support to parallel programmers on their daily work,” said Xavier Martorell, Parallel Programming Models Group Manager at Barcelona Supercomputing Center.
“PSyclone was developed for the UK Met Office and is now a part of the build system for Dynamo, the dynamical core currently in development for the Met Office’s ‘next generation’ weather and climate model software. By generating the complex code needed to make use of thousands of processors, PSyclone leaves the Met Office scientists free to concentrate on the science aspects of the model. This means that they will not have to change their code from something that works on a single processing unit (or core) to something that runs on many thousands of cores.”
Six application development teams from NERSC gathered at Intel in early August for a marathon “dungeon session” designed to help tweak their codes for the next-generation Intel Xeon Phi Knight’s Landing manycore architecture and NERSC’s new Cori supercomputer. “We try to prepare ahead of time to bring the types of problems that can only be solved with the experts at Intel and Cray present—deep questions about the architecture and how applications use the Xeon Phi processor. It’s all geared toward optimizing the codes to run on the new manycore architecture and on Cori.”