AMD-Supermicro-Cornelis (Omni-Path) ‘Mammoth’ Cluster at LLNL Targets COVID-19

Lawrence Livermore National Laboratory and partners AMD, Supermicro and Cornelis Networks have installed a high performance computing cluster with memory and data storage capabilities targeting data-intensive COVID-19 research workloads. The Cornelis interconnect used in the “Mammoth” cluster is based on Intel Omni-Path Architecture (OPA) technology, which Intel announced last year it would no longer support and which reemerged in September as an Intel spin-out with $20M in venture backing.

Funded by the $2.2 trillion federal Coronavirus Aid, Relief and Economic Security (CARES) Act, the cluster will be used at LLNL to perform genomics analysis, nontraditional HPC simulations and graph analytics by scientists working on COVID-19, including the development of antiviral drugs and designer antibodies.

Mammoth comprises 64 nodes outfitted with second-generation AMD EPYC CPUs. Each node has two 64-core CPUs with 128 threads, features high-memory bandwidth and provides 2 terabytes (TB) of DRAM memory and nearly 4 TB of nonvolatile memory. The extra memory afforded by Mammoth is critical for COVID-19 researchers, who must sift through massive databases of information.

LLNL said researchers have begun applying Mammoth to the genome of the COVID-causing SARS-CoV-2 virus to analyze how the it evolves and simulating its structural changes when mutations are introduced. Mammoth is designed to reduce the time required to perform some types of genomic analysis from a few days to a few hours, lab scientists said.

Lawrence Livermore’s Jim Brase

“The ability of large-memory systems to integrate genomic analysis with large-scale machine learning for predictive modeling of therapeutic response will be important for accelerating the development of effective new therapeutics,” said Jim Brase, LLNL’s deputy associate director for computing. “Mammoth will be integral for developing new tools to combat COVID-19, but also for fast response in a future pandemic.”

LLNL said Mammoth also is aiding in the design of modified antibodies for improved binding against the virus. “In our workflow, we compute binding free energies with Rosetta Flex, a code that was memory-limited on other machines to 12 or 16 simultaneous calculations per node,” said LLNL computer scientist Thomas Desautels. “Mammoth enables us to run 128 Rosetta Flex calculations simultaneously on a single node, increasing our throughput by a factor of about eight. Using Mammoth, we can afford to execute many more Rosetta calculations and accelerate our search for SARS-CoV-2 antibody designs.”

Mammoth is the first production AMD-based HPC cluster connected with interconnect components from Cornelis Networks for HPC, high performance data analytics (HPDA), and AI.

“This new system makes a big difference in how we prepare our jobs for calculations and in their performance,” said researcher Adam Zemla, who is using Mammoth for virus mutant detection, clustering, structural modeling and molecular dynamics simulations. “With Mammoth I can easily process COVID-19 genomes without the need to split datasets into smaller chunks, like I have had to do on previous machines.”

San Jose-based information technology company Supermicro provided the hardware for Mammoth, including the racks, servers, motherboards and overall system integration.

“Our 1U A+ Ultra installed in the Mammoth cluster leverages the system’s 64-core AMD EPYC CPUs, high-performance interconnects and large memory footprint, all in just one unit of rack space,” said Charles Liang, president and CEO of Supermicro. “The A+ Ultra platform’s performance and storage flexibility are ideal for the computationally intensive workloads used in running advanced simulations and research projects for these scientific challenges.”

Mammoth was procured through MNJ Technologies, an information technology firm headquartered in Illinois that served as the sourcing and logistics partner for the Supermicro HPC solution.