In many large threaded applications, synchronizing all of the threads by use of barriers can results in significant wasted processing time. If the application lends itself, loosely synchronous barriers instead of strictly synchronous barriers should be used and can recover lost time.
Simulation of physical processes such as the waves in an ocean or the wake behind a boat, although similar in a number of ways, require different approaches. With current systems designed with many parallel computational units, it is important to take advantage of the range of architectural features. Using HYDRO2D, the performance of the code can be examined and improved by taking advantage of a range of system features.
For about 40 years, developers and users could count on an increase in CPU performance that would make applications run faster. However, with the slowdown in constant clock rate increases being replaced by additional core counts and even more new instructions, rethinking algorithms, their use of the latest APIs, and using the latest compilers has become critical for the next generation of application performance enhancements.
In this video from the 2014 Argonne Training Program on Extreme-Scale Computing, James Reinders presents: Computer Architecture and Structured Parallel Programming. “At ATPESC 2014, we captured 67 hours of lectures in 86 videos of presentations by pioneers and elites in the HPC community on topics ranging from programming techniques and numerical algorithms best suited for leading-edge HPC systems to trends in HPC architectures and software most likely to provide performance portability through the next decade and beyond.”