Programmers cannot blindly guess which sections of their code might bottleneck performance. This problem is worsened when codes run across the variety of hardware platforms supported by the Exascale Computing Project (ECP). A section of code that runs well on one system might be a bottleneck on another system. Differing hardware execution models further compound the performance challenges that face application developers; these models can include the somewhat restricted SIMD (Single Instruction Multiple Data) and SIMT (Single Instruction Multiple Thread) computing for GPU models and the more complex and general MIMD (Multiple Instruction Multiple Data) for CPUs. New software programming models, such as Kokkos, also introduce multiple layers of abstraction and lambda functions that can hide or obscure the low-level execution details due to their complexity and anonymous nature. Differing memory systems inside a node and differences in the communications fabric that connect high-performance computing (HPC) nodes in a distributed supercomputer environment add even greater challenges in identifying performance bottlenecks during application performance analysis.
Podcast: Improving Parallel Applications with the TAU tool
In the podcast, Mike Bernhardt from ECP catches up with Sameer Shende to learn how the Performance Research Lab at the University of Oregon is helping to pave the way to Exascale. “Developers of parallel computing applications can well appreciate the Tuning and Analysis Utilities (TAU) performance evaluation tool—it helps them optimize their efforts. Sameer has worked with the TAU software for nearly two and a half decades and has released more than 200 versions of it. Whatever your application looks like, there’s a good chance that TAU can support it and help you improve your performance.”