KAUST and Cerebras Named Gordon Bell Award Finalist

Print Friendly, PDF & Email

SUNNYVALE, Calif. and THUWAL, Saudi Arabia, Sept. 20, 2023 — Saudi Arabia’s King Abdullah University of Science and Technology (KAUST) and AI chip developer Cerebras Systems announced that its work on multi-dimensional seismic processing has been selected as a finalist for the 2023 Gordon Bell Prize for outstanding achievements in HPC.

By developing a Tile Low-Rank Matrix-Vector Multiplications (TLR-MVM) kernel that takes advantage of the unique architecture of the Cerebras CS-2 systems in the Condor Galaxy AI supercomputer, built by Cerebras and partner G42, researchers at KAUST and Cerebras achieved production-worthy accuracy for seismic applications with a record-breaking sustained bandwidth of 92.58 PB/s, highlighting how AI-customized architectures can enable a new generation of seismic algorithms.

“In partnership with KAUST researchers, we are honored to be recognized by Gordon Bell for setting a new record in what is possible for multi-dimensional seismic processing. This work will unlock world-changing advancements across climate and weather modeling, computational astronomy, wireless communication, seismic imaging and more,” said Andrew Feldman, co-founder and CEO of Cerebras Systems. “This is the third year in a row that Cerebras, alongside its partners, has been selected as a finalist for the distinguished Gordon Bell Prize, and we plan to continue delivering groundbreaking innovations in the years to come.”

Seismic applications are vital in shaping our understanding of Earth’s resources and can accelerate the world toward a low-carbon future. Seismic processing of geophysical data collected from the Earth’s subsurface enables us to identify buried hydrocarbon reservoirs, drill for oil with greater accuracy, and optimize for CO2 sequestration sites by identifying potential leakage risks. Modern seismic processing techniques are computationally challenging because they require repeated access to the entire collection of multi-dimensional data. This problem, commonly known as time-domain Multi-Dimensional Deconvolution (MDD), has become tractable thanks to compression techniques that relax the inherent memory and computational burden, such as Tile Low-Rank Matrix-Vector Multiplication (TLR-MVM).

logo of KAUST

Researchers at KAUST and Cerebras approached this problem by re-designing the TLR-MVM algorithm to take advantage of Cerebras’ CS-2 system, which exhibits high memory throughput to deliver high performance for intrinsically memory-bound applications. Cerebras CS-2 contains 850,000 AI compute cores and 40GB of SRAM on-chip and is ideal for problems that are memory bottlenecked and can be heavily parallelized. The re-designed TLR-MVM algorithm was tested on an openly available 3D geological model and ran on 48 Cerebras CS-2 systems in the Condor Galaxy AI supercomputer, which was built by Cerebras and their strategic partner G42. Researchers reported accurate responses at a sustained bandwidth of 92.58 PB/s. This is 3x faster than the aggregated theoretical bandwidth of Leonardo or Summit, two of the world’s current top five supercomputers. Additionally, this result resembles the estimated upper bound (95.38 PB/s) for Frontier, another top-five supercomputer, at a fraction of the energy consumption. TLR-MVM running on Cerebras systems achieved a steady low power consumption of 36.50GFlops per watt, which compares favorably with the 52GFlops per watt of Frontier. These results indicate that production-grade seismic applications can achieve top five supercomputer performance on Cerebras CS-2s at a fraction of the cost and energy consumption.

Researchers re-designed the TLR-MVM algorithm using communication-avoiding principles that favor local SRAM data motion over cross-fabric communications. Researchers mapped the new TLR-MVM algorithm onto the disaggregated memory resources and extracted the desired high-memory bandwidth to deliver unprecedented processing capabilities for enhancing the imaging of seismic data acquired in complex geology. This was done using the Cerebras SDK, which offers lower-level programmatic access to the Cerebras CS-2.

“Disaggregated memory requires fine-grained algorithmic innovations,” said lead author Hatem Ltaief, Principal Research Scientist of KAUST’s Extreme Computing Research Center (ECRC), expressed his excitement with the research achievement. “Working with Cerebras engineers to deploy them and extract the hardware’s full potential was a dream.”

ECRC Director David Keyes added, “It is exciting to discover the versatility of wafer-scale hardware beyond neural network training for which it was conceived and optimized. We join other examples of such architectural crossover in Gordon Bell Prize history.”

Other KAUST co-investigators include Professor Matteo Ravasi of Earth and Environmental Sciences and Engineering and 2023 KAUST Computer Science PhD graduate Yuxi Hong, who now exercises his HPC skills in Lawrence Berkeley National Laboratory’s Exascale Computing Program.

Performing TLR-MVM has been challenging for traditional hardware. It requires significant computational power and yields slow processing times, even on modern supercomputers. Additionally, TLR-MVMs require large amounts of memory to store intermediate results and matrices, which challenges the limitations of traditional CPU and GPU hardware. Finally, TLR-MVMs need to be heavily parallelized to achieve industrial-grade performance. While GPUs are normally well-suited for parallel processing, they are unsuitable in this application because their limited support for batched execution reduces the practical efficiency of TLR-MVM with real-world complex precision and variable ranks. The work conducted by KAUST and Cerebras validates the Cerebras CS-2 as a viable alternative that can achieve record-breaking performance for historically memory-bound applications.

More information on this multi-dimensional seismic processing workload can be found at https://repository.kaust.edu.sa/handle/10754/694388. This research was made possible by G42’s grant of system time on the Condor Galaxy 1 AI supercomputer.