For those who haven’t been following the details of one of DOE’s more recent procurement rounds, the NERSC-8 and Trinity request for proposals (RFP) explicitly required that all vendor proposals include a burst buffer to address the capability of multi-petaflop simulations to dump tremendous amounts of data in very short order. The target use case is for petascale checkpoint-restart, where the memory of thousands of nodes (hundreds of terabytes of data) needs to be flushed to disk in an amount of time that doesn’t dominate the overall execution time of the calculation.
The concept of what a burst buffer is remains poorly defined. I got the sense that there are two outstanding definitions:
- The NERSC burst buffer is something more tightly integrated on the compute side of the system and may be a resource that can be allocated on a per-job basis
- The Argonne burst buffer is something more tightly integrated on the storage side of the system and acts in a fashion that is largely transparent to the user. This sounded a lot like the burst buffer support being explored for Lustre.
In addition, Los Alamos National Labs (LANL) is exploring burst buffers for the Trinity procurement, and it wasn’t clear to me if they had chosen a definition or if they are exploring all angles. One commonality is that DOE is going full-steam ahead on providing this burst buffer capability in some form or another, and solid-state storage is going to be a central enabling component.
Read the Full Story.