Today ECMWF launched the ESCAPE-2 project on energy-efficient scalable algorithms for weather and climate prediction at exascale. The effort will build on the success of the ESCAPE project, which has achieved remarkable gains in computing efficiency by developing the concept of weather and climate dwarfs.
Like its predecessor, ESCAPE-2 is a three-year project coordinated by ECMWF and funded by the European Commission’s Horizon 2020 Future and Emerging Technologies for High-Performance Computing (FET-HPC) program. It brings together 12 partners, including national meteorological and hydrological services, HPC centers, hardware vendors and universities. The ESCAPE project aims to prepare NWP and climate models for new computing architectures towards exascale computing, with a focus on energy efficiency.
The project developed the concept of fundamental building blocks called dwarfs. Dwarfs represent functional units in the forecasting model, such as an advection or a physics parametrization scheme, which also come with specific computational patterns for processor memory access and data communication.
Assessing numerical methods and algorithms for dwarfs rather than entire models reduces the complexity of the code. It enables HPC centers, research groups and hardware vendors to focus on specific aspects of performance for which code restructuring and adaptation to novel processor architectures is more straightforward.
The codes were optimized for different types of Intel CPU and NVIDIA GPU processors and a new technique particularly suited for performing Fourier transformations with an optical device.
For spectral transforms on CPUs, efficiency gains of up to 40% were achieved. Code optimization for GPU delivered speed-up factors of about 10 to 50 on a single node, and again by a factor of 2 to 3 when deployed on multiple GPUs connected by NVSwitch.
However, using accelerators only for a small part of the code destroys a lot of the benefit in terms of the overall cost if the CPUs are idle while the accelerators perform their computations.
Ideally, a large part of the code is moved to the accelerator, or computations on the host-CPUs are overlapped with computations on the accelerator. Fully implementing either option will require further work.
Domain-specific languages (DSL) were another focus of the ESCAPE project. They are a promising tool to enable good performance on multiple architectures while still having a single, portable code base.
Designing a DSL that is user-friendly whilst delivering good performance on each architecture is still a challenge.
Tests with a dwarf calculating the advection of air showed a speed improvement by a factor of 2 compared to the manually adapted version on GPUs.
Beyond code adaptation and optimization, a range of numerical methods exploiting multi-grid solvers and different types of spatial discretisation and time stepping have been investigated. This work will support ECMWF’s development of the finite-volume module, FVM, that presents an alternative option to the currently operational spectral-transform dynamical core of the Integrated Forecasting System (IFS).
It also supports strategies of ECMWF’s Member and Co-operating States for their limited-area applications.
ESCAPE-2 will extend the work on dwarfs to other models, such as the German national meteorological service’s ICON model and the community ocean model NEMO.
It will ultimately develop benchmarks that represent the computing and data handling patterns of weather and climate models more realistically and are thus more suitable for assessing the performance of future HPC systems.
This project will be relevant for future procurements but will also guide the performance assessment of future HPC systems of the EuroHPC Joint Undertaking.
ESCAPE-2 will combine cross-disciplinary uncertainty quantification tools (URANIE) for HPC, originating from the energy sector, with ensemble-based weather and climate models to quantify the effect of model- and data-related uncertainties on forecasting in a cost-effective way.
The mathematical and algorithmic research in ESCAPE-2 will focus on implementing data structures and tools supporting parallel computation of dynamics and physics on multiple scales and multiple levels.
Highly-scalable spatial discretization will be combined with proven large time-stepping techniques to optimize both time-to-solution and energy-to-solution.
Connecting multi-grid tools, iterative solvers, and overlapping computations with flexible-order spatial discretization will strengthen algorithm resilience against soft or hard failure. In addition, machine learning techniques will be applied to accelerate complex sub-components.
The intended outcome is a solution which combines performance, resilience and accuracy with portability.
This story appears here as part of a cross-publishing agreement with Scientific Computing World.