ExaIO: Access and Manage Storage of Data Efficiently and at Scale on Exascale Systems

As the word exascale implies, the forthcoming generation exascale supercomputer systems will deliver 1018 flop/s of scalable computing capability. All that computing capability will be for naught if the storage hardware and I/O software stack cannot meet the storage needs of applications running at scale—leaving applications either to drown in data when attempting to write to storage or starve while waiting to read data from storage. Suren Byna, PI of the ExaIO project in the Exascale Computing Project (ECP) and computer staff scientist at Lawrence Berkeley National Laboratory, highlights the need for preparation to address the I/O needs of exascale supercomputers by noting that storage is typically the last subsystem available for testing on these systems.

DDN’s HPC Trends Survey: Complex I/O Workloads are the #1 Challenge

Today DDN announced the results of its annual HPC Trends survey, which reflects the continued adoption of flash-based storage as essential to respondent’s overall data center strategy. While flash is deemed essential, respondents anticipate needing additional technology innovations to unlock the full performance of their HPC applications. Managing complex I/O workload performance remains far and away the largest challenge to survey respondents, with 60 percent of end-users citing this as their number one challenge.

HPC I/O for Computational Scientists

Phil Carns from Argonne gave this talk at the 2017 Argonne Training Program on Extreme-Scale Computing. “Darshan is a scalable HPC I/O characterization tool. It captures an accurate but concise picture of application I/O behavior with minimum overhead. Darshan was originally developed on the IBM Blue Gene series of computers deployed at the Argonne Leadership Computing Facility, but it is portable across a wide variety of platforms include the Cray XE6, Cray XC30, and Linux clusters.  Darshan routinely instruments jobs using up to 786,432 compute cores on the Mira system at ALCF.”

Video: Exploring I/O Challenges at Exascale

“Clear trends in the past and current petascale systems (i.e., Jaguar and Titan) and the new generation of systems that will transition us toward exascale (i.e., Aurora and Summit) outline how concurrency and peak performance are growing dramatically, however, I/O bandwidth remains stagnant. In this talk, we explore challenges when dealing with I/O-ignorant high performance computing systems and opportunities for integrating I/O awareness in these systems.”

Video: Parallel I/O Best Practices

In this video from the 2016 Blue Waters Symposium, Andriy Kot from NCSA presents: Parallel I/O Best Practices.

Understanding I/O Patterns at the Block Level with ioprof

Over at Admin HPC, Intel’s Jeff Layton writes that understanding how data makes its way from the application to storage devices is key to understanding how I/O works and that monitoring the lowest level of the I/O stack, the block driver, is a crucial part of this overall understanding of I/O patterns.