Let’s say you were to ask one of the millions and millions (ok, dozens and dozens) of HPC storage enthusiasts on the street the first phrase that comes to their mind about HPC storage. While I could envision many answers, I doubt “object storage” would rise to the top of the list. In most configurations, object storage is associated with “colder”, archival tier storage, typically valuing capacity over IOPS and bandwidth, the opposite of “high performance”. Well, I’m here to tell you to drop those preconceptions, because we at Lenovo are doing some exciting stuff with Intel’s DAOS software.
DAOS, or Distributed Asynchronous Object Storage, is a scale-out HPC storage stack that uses the object storage paradigm to bypass some of the limitations of traditional parallel file system architectures. In particular, DAOS was designed in such a way to avoid the POSIX I/O layer. POSIX, or Portable Operating System Interface, is a venerable communication standard that is widely used, including with most parallel file systems. Having been around for over 30 years, POSIX was designed in a time when almost all storage was on spinning disk. As such, certain I/O behaviors that were optimized to the spinning disk format are no longer optimal with SSD or NVMe, negating some of their benefits.
Rather than standardize on POSIX, DAOS gets around these limitations with a ground-up redesign of I/O architecture. Instead of locking a block of data with I/O (as POSIX does), DAOS leverages an “epoch” model that allow multiple I/Os to occur concurrently, and then essentially “rewinds” after the fact to “play back” the I/O events when block access is no longer under contention. This allows DAOS to excel in cases where traditional parallel file systems get bogged down, such as small random writes and reads. For a much more thorough synopsis of DAOS and its architecture, check out this paper published by my colleagues at Intel and Lenovo from the SC Asia 2020 Proceedings.
So, where does the hardware come in and why does it matter? Well, as alluded to above, DAOS is flash-native: designed from the ground up with the intent to utilize Storage Class Memory and NVMe. That means that any storage server running DAOS needs a robust capacity for flash. Additionally, DAOS leverages the key-value scheme from the object storage paradigm, storing metadata (and even small I/Os) separately from larger I/Os, with the metadata (the key) being present on storage-class memory, and storage (the value) being on the NVMe. This is crucial, since the combination of PMM and NVMe allows the usage of low endurance drives to not be without overloading them with small writes, providing the higher performance and a lower price point compared to other NVMe offerings.
Lenovo’s ThinkSystem® SR630 represents an optimal server platform for running DAOS. As a 1U node with two of Intel’s 2nd Generation Intel® Xeon® Scalable processors, up to 8x NVMe drives and up to 12x Intel Persistent Memory Modules (PMMs) for the storage class memory. The thin form-factor and no-compromise design mean ThinkSystem® SR630 can scale to any need. Additionally, with lots of options for Ethernet, InfiniBand, and Omni-Path fabrics, it’s easy to add a configuration to an existing cluster or storage tier.
For those looking to learn more about DAOS, and Lenovo and Intel’s plans around it, there is a flurry of activity going on these days. While the in-person Supercomputing Conference (ISC) was cancelled, virtual SC is going strong and will have demo sessions and tech talks from Intel and Lenovo going in-depth on roadmap, features, and performance – reach out to hpc(delete this)storage at lenovo.com to get more information. Additionally, for more evidence of DAOS’s excellent performance characteristics, check out the latest IO500 rankings and see just how powerful and revolutionary this solution stack is.
As time draws nearer to the bring-up of the Aurora 21 cluster at Argonne (featuring a massive DAOS installation), I anticipate more and more customers looking to try this new tech out themselves. And that’s one of the best parts about DAOS – it’s open source! To the millions (or dozens) of HPC storage enthusiasts addressed at the start of this article: go to GitHub and check out DAOS today. And watch this space! For both Intel and Lenovo, I suspect this is just the start of a great partnership. Lenovo is committed to bring the building block technologies of the next generation of HPC like DAOS, to clusters of any size. It’s what we call “From Exascale to Everyscale®”