May 15, 2025 — The Argonne Leadership Computing Facility will host an overview of key AI frameworks, toolkits, and strategies to achieve high-performance training and inference on the Aurora exascale supercomputer for scientific applications.
This virtual session will be held from 11 am-noon CT on Wednesday, May 28. Register here.
Members of the Argonne National Laboratory ALCF AI/ML team will cover examples of using PyTorch and TensorFlow on Aurora, followed by distributed training at scale using PyTorch with Distributed Data Parallel (DDP) and TensorFlow with Horovod, all driven by the oneCCL communication library.
Additionally, the speakers will discuss using Python on Intel’s GPUs with Data Parallel Extensions for Python (DPEP). To maximize GPU performance, the webinar will share best practices for profiling codes and identifying bottlenecks.