DOE: E4S for Extreme-Scale Science Now Supports Nvidia Grace and Grace Hopper GPUs

E4S, the open source Extreme-Scale Scientific Software Stack for HPC-AI scientific applications, now incorporates AI/ML libraries and expands GPU support to include the Nvidia Grace and Grace Hopper architectures.

The new E4S release, version 24.02, supports the mission of the US Department of Energy’s Exascale Computing Project (ECP) to expand usability across different GPU platforms while maintaining its portfolio approach (for a dicussion of this dual-faced approach see this article).

“We are excited about the emerging role of E4S in the integrated use of HPC and AI to address the frontiers of scientific discovery and to complement our industry partners,” said Mike Heroux, E4S project leader and senior scientist at Sandia National Laboratories. “The incorporation of standard AI/ML libraries such as PyTorch, TensorFlow, JAX, and Horovod along with emerging scientific ML libraries ensures the research community has a trusted and robust capability now and in the future.”

E4S is a community effort to provide open-source software packages on high-performance computing (HPC) and AI resources and cloud platforms. By the numbers, the 24.02 release provides over 120 HPC packages.  Supported architectures include all major GPU platforms from NVIDIA, AMD, and Intel and CPU platforms ARM, x86_64, and ppc64le. To accelerate the build process, there are over 123,000 binaries in the E4S’s Spack build cache.

Support for Grace and Grace Hopper GPUs

Support for the Nvidia’s Grace and Grace Hopper architectures expands upon E4S’s existing support for other Nvidia and AMD GPUs for HPC-AI workloads. Intel GPU support continues to expand with support for additional HPC workloads (see below).

This latest E4S release supports a variety of Nvidia and AMD GPU accelerated AI frameworks, thus enabling a broad collection of accelerated HPC-AI workloads, including TorchBraid, LBANN, TensorFlow, PyTorch, JAX, Pandas, Scikit-Learn, OpenCV, Horovod, Keras, and OpenAI

All Python packages can be accessed via Jupyter Notebooks. This release contains updated Python tools, including Seaborn and Plotly.

Left: Mike Heroux, Sandia National Laboratories
Right: Sameer Shende, University of Oregon

To ensure correct operation on the new Nvidia architectures, the University of Oregon’s Frank cluster now has both Grace and Grace Hopper equipped nodes for continuous integration (CI) and continuous deployment (CD) verification. Sameer Shende, E4S project lead, research professor, and director of the Performance Research Lab at the University of Oregon, offered that HPC groups and organizations can contact him via the E4S project to gain access to the Frank cluster (short for Frankenstein) and use the CI/CD hardware that is not subject to NDA.

Expanded HPC Application and Intel GPU Capabilities

The 24.02 release adds support for GROMACS and CP2K in addition to the already supported Xyce, Quantum Espresso, ExaGo, LAMMPS, WARPX, Dealii, and OpenFOAM applications. This release also includes support for Intel oneAPI 2024.0.2 software (BaseKit and HPCToolkit) in containers on x86_64 platforms, support for HPC packages built with Intel compilers and Intel MPI, and support for Intel’s Data Center GPU Max 1000 series (Ponte Vecchio).

Cloud-Oriented Features

Cloud users and those considering HPC-AI in the cloud will benefit from the E4S and the commercial E4S Pro cloud-oriented features included in the 24.02 release. Heroux noted, “E4S is provided in the cloud to facilitate evaluation and multiparty collaboration as well as use by industry and others to run HPC-AI workloads in the cloud.”

The creation of the E4S Pro images was funded by a DOE Small Business Innovation Research grant. The cloud E4S Pro containers focus on “all scale” multi-node deployments. These images are now available on AWS Marketplace and on AWS GovCloud.

Adaptive Computing’s On-Demand Data Center (ODDC) platform uses the multi-node E4S Pro images from ParaTools, Inc. to enable their users to launch on AWS, GCP, OCI, and Azure with support for VNC (Virtual Network Computing)-based remote desktop (https://adaptivecomputing.com/cherry-services/adaptive-ai-as-a-service/).

Adaptive Computing can also do bare-metal, on-premises deployments on their own hardware. Adaptive Computing CEO Art Allen explained, “E4S and Adaptive Computing’s ODDC provide a consistent platform for both multi-cloud and on-premises HPC-AI deployments using a trusted high-performance software stack. ODDC enables launching jobs on multiple nodes through a web browser and supports a responsive remote desktop environment in E4S based on VNC.”

A New Tool: e4s-chain-spack.sh

Shende explained the reasoning behind the e4s-chain-spack tool, “Users can augment a read-only base install to create custom installations. The ability to have two simultaneous Spack installations makes it easy for a systems administrator to deploy a read-only container base deployment. Users can then add custom packages to meet their needs without incurring software bloat because the package dependencies can be fulfilled by packages in the read-only deployment.”

He continued, “This works especially well with new users who leverage a containerized deployment, which can then be chained with a custom install so the users can get exactly what they want without experiencing software bloat. It also addresses fulfilling complex software dependencies in modern packages—the user does not have to rebuild a myriad of additional packages by themselves, thus greatly simplifying deployment and alleviating package-dependency headaches.”

Integrated Development Features

The 24.02 release of E4S also includes the VSCodium (MIT License) Integrated Development Environment, which is a community-driven, freely licensed binary distribution of Microsoft’s editor VS Code. This expands the existing support in E4S for other interactive development environments such as Jupyter notebooks.

Heroux elaborated on the uniqueness of E4S for government, academic, and industry users, “E4S represents the largest collection of performance-portable scientific libraries and tools for accelerated platforms in the world.” According to Heroux, the use of accelerators puts HPC and industry users back on the commodity power/performance curve to leverage the efficiency of accelerators to unlock significant hardware performance and efficiency boosts (up to 100×).

Shende described the unique capabilities it provides, “E4S is a curated Spack-based distribution of tools that is extensible, customizable, and lowers the barriers to entry for HPC-AI developers. It is a software stack that scales from laptops and desktops to departmental clusters to supercomputers and beyond to commercial cloud platforms. It provides a consistent environment for developing and deploying the next generation of high-performance applications that can easily leverage GPUs.”

source: Exascale Computing Project