Scientists from the Heat and Mass Transfer Technological Center (CTTC) at the Technical University of Catalonia in Spain have harnessed the extreme performance of the Mira supercomputer with their in-house multi-physics CFD code as a result of collaboration on scalable debugging for the high-end system between Allinea Software and Argonne National Laboratory.
After participating in the Argonne Training Program for Extreme Scale Computing (ATPESC) at the Argonne Leadership Computing Facility (ALCF) in the US, scientists from the CTTC had the opportunity to use ALCF’s supercomputer, Mira – a 10-petaflops IBM Blue Gene/Q supercomputer equipped with 786,432 processors and 768 terabytes of memory – to test the performance and scalability of their CFD code, TermoFluids. The focus was set on the first level of parallelization based on a distributed memory model and MPI communications.
“Before accessing Mira, TermoFluids had been used on production simulations up to around 5,000 CPU cores and scalability tests up to 10,000 CPU cores,” Ricard Borrell, the research manager behind Termo Fluids, commented. “On Mira we have increased this figure by an order of magnitude and have now run the code up to 131,072 CPU cores. Not only did this include the most time-consuming part of the simulations, i.e. the time-integration, but other aspects that can become critical overheads such as the pre-processing, the simulation set-up, and IO operations for check pointing as well.”
Through Mira, the code could be run on much larger problems up to billions of unknowns. This required some changes, however, on the type of some integer variables in order to avoid those that fell out of range. When achieving this order of magnitude leap in the size of the problem and number of parallel processors being used, Borrell encountered new problems in the code that only appeared at this larger scale. Given that the issues couldn’t be reproduced on a smaller scale in order to find the bugs, the team turned to Allinea DDT.
Debuggers are essential tools for our users as they scale their application on Mira, and there have been several instances where users have leaned on a debugger to find issues as they have scaled on the system,” Kalyan Kumaran, Manager, Performance Engineering at ALCF, explained. “Allinea scaled their debugger to perform well on leadership class systems like ours. This helped us to choose this tool as we were looking for a debugger that would scale to the entire Mira system.” He added that as most ALCF users access the systems remotely, a remote connection client such as Allinea’s is important for ease of use.
“High-performance computing resources like Mira give developers like us the power to go further,” added Borrell. “Extracting that full performance is critical if you want to handle more complex problems, finer resolutions and achieve new frontiers – and you’ll need Allinea DDT to do it.”
Read more about how this was achieved in their case study.