DAOS: Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence

Print Friendly, PDF & Email

Johann Lombardi is a Principal Engineer at Intel.

In this video, Johann Lombardi from Intel presents: DAOS – Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence. As an all-new parallel file system, DAOS will be a key component of the the upcoming Aurora supercomputer coming to Argonne National Laboratory in 2021.

Intel has been building an entirely open source software ecosystem for data-centric computing, fully optimized for Intel architecture and non-volatile memory (NVM) technologies, including Intel Optane DC persistent memory and Intel Optane DC SSDs. Distributed Asynchronous Object Storage (DAOS) is the foundation of the Intel exascale storage stack. DAOS is an open source software-defined scale-out object store that provides high bandwidth, low latency, and high I/O operations per second (IOPS) storage containers to HPC applications. It enables next-generation data-centric workflows that combine simulation, data analytics, and AI.”

Unlike traditional storage stacks that were primarily designed for rotating media, DAOS is architected from the ground up to make use of new NVM technologies, and it is extremely lightweight because it operates end-to-end in user space with full operating system bypass. DAOS offers a shift away from an I/O model designed for block-based, high-latency storage to one that inherently supports fine- grained data access and unlocks the performance of next- generation storage technologies.

Existing distributed storage systems use high-latency peer- to-peer communication, whereas DAOS is designed to use low-latency, high-message-rate user-space communications that bypass the operating system. Most storage systems today are designed for block I/O, where all I/O operations go through the Linux* kernel with a block interface. Much work has been done to optimize access to the block device (such as coalescing, buffering, and aggregation). But all those optimizations are not relevant for the next-generation storage devices that Intel is targeting, and they will incur unnecessary overhead if used. DAOS, on the other hand, is designed to optimize access to Intel Optane DC persistent memory and NVM Express* (NVMe*) solid state drives (SSDs), and it eschews this unnecessary overhead.

DAOS servers maintain their metadata on persistent memory, with bulk data going straight to NVMe SSDs. In addition, small I/O operations will be absorbed on the persistent memory before being aggregated and then migrated to the larger-capacity flash storage. DAOS uses the Persistent Memory Development Kit (PMDK) to provide transactional access to persistent memory and the Storage Performance Development Kit (SPDK) for user-space I/O to NVMe devices. This architecture allows for data-access times that can be several orders of magnitude faster than in existing storage systems (microseconds [μs] versus milliseconds [ms]).

Application Interface and I/O Middleware Integration

The DAOS client library is designed to have a small footprint, to minimize noise on the compute nodes, and to support non-blocking operations with explicit progress. The DAOS operations are function-shipped to the DAOS storage servers using libfabrics* and OpenFabric Interface* (OFI*), taking advantage of any remote direct memory access (RDMA) capabilities in the fabric.

In this new storage paradigm, POSIX is no longer the foundation for new data models. Instead, the POSIX interface is built as a library on top of the DAOS back-end API, like any other I/O middleware. A POSIX namespace can be encapsulated in a container and mounted by an application into its file system tree. This application-private namespace will be accessible to any tasks of the application that successfully opened the container. Tools to parse the encapsulated namespace will be provided. Both the data and metadata of the encapsulated POSIX file system will be fully distributed across all the available storage with a progressive layout to help ensure both performance and resilience. In addition, the POSIX emulation features the following: scalable directory operations, scalable shared file I/O, scalable file- per-process I/O, and self-healing to recover from failed or corrupted storage.

With ultra-low latency and fine-grained access to persistent storage, Intel Optane DC persistent memory represents a real opportunity to transform the industry and overcome the limitations of storage systems in data centers today. Intel Optane DC SSDs improve the solution further, bringing high IOPS and handling reads and writes concurrently without degradation. Existing distributed storage software, however, was not built for these new technologies, and it can mask the value the technologies provide. A complete rethink of the software storage stack is required to design a new solution from the ground up in order to throw off irrelevant optimizations designed for disk drives, embrace fine-grained and low-latency storage access with rich storage semantics, and unlock the potential of these revolutionary technologies for distributed storage.

DAOS is available on GitHub under the Apache 2.0 license. A new DAOS version is planned every six months; check the DAOS roadmap for more information.

Download the MP3 * Subscribe on iTunes * Subscribe to RSS 

Sign up for our insideHPC Newsletter