ALCF Developer Session May 24: Preparing XGC and HACC to Run on the Aurora Exascale Supercomputer

Print Friendly, PDF & Email

May 1, 2023 — An Argonne Leadership Computing Facility (ALCF) Developer Session will be held from 11-noon CT on Wednesday, May 24, 2023 on porting strategies for ALCF’s upcoming Aurora exascale-class supercomputer for two applications: the XGC gyrokinetic plasma physics code and the HACC cosmology code. Registration is here.

Speakers will be Esteban Rangle, assistant computer scientist, and Aaron Scheinberg , of the ALCF. Scheinberg is a computational scientist and consultant focusing on exascale computing, scientific application performance, particle-based methods, magnetic fusion simulations, and GPU programming. Rangel joined the Computational Science (CPS) division at Argonne National Laboratory as a staff scientist in July 2021. He became a postdoc at the ALCF after receiving his PhD in Computer Science from Northwestern University in 2018. He began contributing to the HACC codebase as a graduate student, where much of the work towards his PhD thesis was designing and implementing scalable analysis software for N-body cosmological simulations.

They will discuss lessons learned and tools that were crucial in porting these applications to Argonne’s exascale machine. For the XGC portion of the talk, Scheinberg will discuss the lessons learned from running on diverse new machines (Polaris, Sunspot, and recently Frontier), the unique challenges of Aurora, and how these inform our plans as Aurora becomes available.

For the HACC portion, Rangel will cover the tools and development strategies used to port HACC from CUDA to SYCL, the challenges of supporting multiple codebases (CUDA/HIP/SYCL) in HACC, and the optimizations made to improve performance for the Intel Xe GPUs.

The gyrokinetic plasma physics code XGC has been offloaded almost entirely to GPU via Kokkos and Cabana over the course of ECP. In addition to accelerating computation, we find that communication patterns and memory usage must be very flexible to maintain a code base that is performant across architectures and scales. The XGC portion of the talk will cover the progress made; the lessons learned from running on diverse new machines (Polaris, Sunspot, and recently Frontier); the unique challenges of Aurora; and how these inform our plans as Aurora becomes available.

The HACC application uses CUDA as programming model on GPUs and since CUDA is propriety language the application developers have to convert their kernels to programming model suitable for Aurora.  The HACC portion of the talk will discuss the tools and development strategies used to port HACC from CUDA to SYCL. We will cover the challenges of supporting multiple codebases (CUDA/HIP/SYCL) in HACC, and the optimizations made to improve performance for the Intel Xe GPUs.