Cray to Deliver First Exabyte HPC Storage System for Frontier Supercomputer

Print Friendly, PDF & Email

At ISC 2019, Cray announced plans to deliver the worlds first Exabyte HPC storage system to Oak Ridge National Lab. As part of the Frontier CORAL-2 contract DOE and ORNL, the next generation Cray ClusterStor storage file system will be integrated as part of ORNL’s Frontier exascale supercomputer, built on Cray’s Shasta architecture.

We are excited to continue our partnership with ORNL to collaborate in developing a next generation storage solution that will deliver the capacity and throughput needed to support the dynamic new research that will be done on the Frontier exascale system for years to come,” said John Dinning, chief product officer at Cray. “By delivering a new hybrid storage solution that is directly connected to the Slingshot network, users will be able to drive data of any size, access pattern or scale to feed their converged modeling, simulation and AI workflows.”

The storage solution is a new design for the data-intensive workloads of the exascale era and will be based on next generation Cray ClusterStor storage and the Cray Slingshot high-speed interconnect. The storage system portion of the previously-announced Frontier contract is valued at more than $50 million, which is the largest single Cray ClusterStor win to date. The Frontier system is expected to be delivered in 2021.

The new storage solution will be based on the next generation of Cray’s ClusterStor storage line and will be comprised of over one exabyte (EB) of hybrid flash and high capacity storage running the Lustre® parallel file system. One exabyte of storage is 1,000 petabytes (or one quintillion bytes), which is enough capacity to store more than 200 million high definition movies. The storage solution will be directly connected to ORNL’s Frontier system via the Slingshot system interconnect to enable seamless scaling of diverse modeling, simulation, analytics and AI workloads running simultaneously on the system. The Frontier system is anticipated to debut in 2021 as the world’s most powerful computer with a performance of greater than 1.5 exaflops.

Compared to the storage for ORNL’s current Summit supercomputer, this next generation solution is more than four times the capacity (more than 1 EB (or 1,000 PB) versus 250 PB), and more than four times the throughput (up to 10 TB/s versus 2.5 TB/s) of their existing Spectrum Scale-based storage system. The new Cray ClusterStor storage solution for ORNL will be comprised of over 40 cabinets of storage and provide more than 1 EB of total capacity across two tiers of storage to support random and streaming access of data. The primary tier is a flash tier for high-performance scratch storage and the secondary tier is a hard disk tier for high capacity storage. The new storage system will be a center-wide system at ORNL in support of the Frontier exascale system and will be accessed by the Lustre global parallel file system with ZFS local volumes all in a single global POSIX namespace, which will make it the largest single high-performance file system in the world.

HPC storage systems have traditionally utilized large arrays of hard disks accessed via large and predictable reads and writes of data. This is in stark contrast to AI and machine learning workloads, which typically have a mix of random and sequential access of small and large data sizes. As a result, traditional storage systems are not well suited for the combined usage of these workloads given the mix of data access and the need for an intelligent high-speed system interconnect to quickly move massive amounts of data on and off the supercomputer to enable these diverse workloads to run simultaneously on exascale systems like Frontier.

The next generation ClusterStor-based storage solution addresses these challenges head on by providing a blend of flash and capacity storage to support complex access patterns, a powerful new software stack for improved manageability and tiering of data, and seamless scaling across both compute and storage through direct connection to the Slingshot high-speed network. In addition to scaling, the direct connection of storage to the Slingshot network eliminates the need for storage routers that are required in most traditional HPC networks. This results in lower cost, lower complexity and lower latency in the system overall, thus delivering higher unprecedented application performance and ROI. Additionally, since Slingshot is ethernet compatible, it can also enable seamless interoperability with existing third party network storage as well as with other data and compute sources.

Cray’s Shasta supercomputers, ClusterStor storage and the Slingshot interconnect are quickly becoming the leading technology choices for the exascale era by combining the performance and scale of supercomputing with the productivity of cloud computing and full datacenter interoperability. The new compute, software, storage and interconnect capabilities being pioneered for leading research labs like ORNL are being productized as standard offerings from Cray for research and enterprise customers alike, with expected availability starting at the end of 2019.

Sign up for our insideHPC Newsletter