Search Results for: Roofline analysis

Optimizing HPC Code with Roofline Analysis

In this special guest feature, James Reinders describes why roofline estimation is a great tool for code optimization in HPC. “As a long-time teacher of optimization techniques, I can confidently say that Roofline analysis is a must-have for anyone optimizing for performance. This has not always been the case. As I will explain, today it is an important technique to draw upon when doing performance optimization.”

Intel Advisor Roofline Analysis Finds New Opportunities for Optimizing Application Performance

Intel Advisor, an integral part of Intel Parallel Studio XE 2017, can help identify portions of code that could be good candidates for parallelization (both vectorization and threading). It can also help determine when it might not be appropriate to parallelize a section of code, depending on the platform, processor, and configuration it’s running on. Intel Advisor Roofline Analysis reveals the gap between an application’s performance and its expected performance.

Exascale Hardware Evaluation: Workflow Analysis for Supercomputer Procurements

It is well known in the high-performance computing (HPC) community that many (perhaps most) HPC workloads exhibit dynamic performance envelopes that can stress the memory, compute, network, and storage capabilities of modern supercomputers. Optimizing HPC workloads to run efficiently on existing hardware systems is challenging, but attempting to quantify the performance envelopes of HPC workloads to extrapolate performance predictions for HPC workloads on new system architectures is even more challenging, albeit essential. This predictive analysis is beneficial because it helps each data center’s supercomputer procurement team extrapolate to the new machines and system architectures that will deliver the most performance for production workloads at their datacenter. However, once a supercomputer is installed, configured, made available to users, and benchmarked, it is too late to consider fundamental architectural changes.

Improving HPC Performance with the Roofline Model

“When we are optimizing our objective is to determine which hardware resource the code is exhausting (there must be one, otherwise it would run faster!), and then see how to modify the code to reduce its need for that resource. It is therefore essential to understand the maximum theoretical performance of that aspect of the machine, since if we are already achieving the peak performance we should give up, or choose a different algorithm.”

Boosting Manycore Code Optimization Efforts with Roofline Technology

A software toolkit developed at Berkeley Lab to better understand supercomputer performance is now being used to boost application performance for researchers running codes at NERSC and other supercomputing facilities. “Since its initial development, what is now known as the Empirical Roofline Toolkit (ERT) has benefitted from contributions by several Berkeley Lab staff. Along the way, HPC users who write scientific applications for manycore systems have been able to apply the toolkit to their applications and see how changing parameters of their code can improve performance.”

A New Way to Visualize Performance Optimization Tradeoffs

A valuable feature of Intel Advisor is its Roofline Analysis Chart, which provides an intuitive and powerful visualization of actual performance measured against hardware-imposed performance ceilings. Intel Advisor’s vector parallelism optimization analysis and memory-versus-compute roofline analysis, working together, offer a powerful tool for visualizing an application’s complete current and potential performance profile on a given platform.

Exascale: ALCF and Intel to Host Aurora Learning Paths Series

The Argonne Leadership Computing Facility (ALCF) in partnership with Intel will host the ALCF Aurora Learning Paths learning seriesto explore use of oneAPI and Data Parallel C++ (DPC++), Intel’s open-source implementation of SYCL, to demonstrate methods to achieve performant, portable code across five platforms available on the Intel Devcloud. There are four modules within the […]

An Update on Aurora Exascale Readiness at Argonne

For a number of years, Intel has provided the Advisor and VTune performance profiling software for CPU-based Intel architectures. In advance of the delivery of the ALCF’s Intel-HPE exascale system, Aurora, in 2022, Intel has been extending those tools to its Xe GPU architecture. Because the Aurora software development kit (SDK) incorporates multiple major programming models—such as DPC++/SYCL, OpenMP Target offloading (for C, C++ and Fortran), OpenCL, Kokkos, and RAJA—those tools need to be tested across every combination of programming model. Many important applications being developed under DOE’s Exascale Computing Project (ECP) integrate Intel’s optimized math libraries to maximize their performance; it is therefore crucial that Advisor and VTune are able to capture their performance characteristics seamlessly.

NERSC Finalizes Contract for Perlmutter Supercomputer

NERSC has moved another step closer to making Perlmutter — its next-generation GPU-accelerated supercomputer — available to the science community in 2020. In mid-April, NERSC finalized its contract with Cray — which was acquired by Hewlett Packard Enterprise (HPE) in September 2019 — for the new system, a Cray Shasta supercomputer that will feature 24 […]

Latest Release of Intel Parallel Studio XE Delivers New Features to Boost HPC and AI Performance

Intel Parallel Studio XE is a complete software development suite that includes highly optimized compilers and math and data analytics libraries, along with comprehensive tools for performance analysis, application debugging, and parallel processing. It’s available as a download for Windows, Linux, and MacOS. “With this release, the focus is on making it easier for HPC and AI developers to deliver fast and reliable parallel code for the most demanding applications.”