How NVM Will Shake up Supercomputing

In this special guest feature, our insideHPC Performance Guru Bill D’Amico looks at a recent panel discussion on the future of mass storage at the Lustre Users Group conference. This thought-provoking session had some strong resonance with discussions on Non Volatile memory (NVM) at the open Fabrics developer’s conference.

Much of the discussion at LUG was about how spinning media will go away in the next 6 years. Exactly what that means for high performance storage isn’t clear and there was much discussion about Burst Buffers. This makes sense if NVM functions in much the same way that Flash storage (SSD based) does today – read and write latencies that while much faster than spinning media aren’t close to RAM latency. The potential exists for NVM to work at close to DRAM latency, making it just another type of RAM in the system that has special persistence.


The following thumbnail analysis provides some food for thought.

Currently laboratory versions of NVM using new technologies have latencies in the 5 – 3000 nanosecond range. DRAM latencies are in the 20 – 50 nanosecond range, putting the two types of memory much closer than disk which has latencies in the 3 – 6 millisecond (3,000,000- 6,000,000 nanosecond range). NVM latencies look to be faster than SSD latencies as well – SSD latencies are 20-40 microseconds (20,000-40,000 nanoseconds).


These low latencies mean it may be sensible for the cpu to stall rather than block while waiting for “disk”. Think of NVM as a new form of NUMA memory that is persistent. System designs could take advantage of this for checkpointing – enable a second class of store operation that writes to both NVM and DRAM, with a checkpoint completion flag returned when the NVM write completes.

Regardless of new system designs it seems clear that disk storage will be less important in the future. We’ve already seen relatively slow SSDs cut down on the amount of traffic to spinning disks today. This trend can’t help but accelerate as faster NVM is introduced.

The panel at the Lustre Users Group expects that by 2020 HPC won’t be using disk storage for final results or intermediate results data. Final results will be archived to tape or some other very low cost, high latency storage (after visualization and analysis). Intermediate data will live in the compute partition in NVM. So the software systems needed to do parallel I/O to spinning disks will lose their value as NVM comes into systems. The panel also discussed the explosion of data from sensors; but once again NVM will remove the need for high performance disk I/O. The only question is how fast the cost of low latency NVM will come down. In 2014 NVM is in proof of concept stage of development, spinning disks are ubiquitous and there is a compelling need for software to enable high bandwidth I/O to spinning disks.

A future article will look at the memory landscape over the next few years, as there are some significant developments in both NVM and traditional RAM that does not retain information after a power failure that will profoundly impact HPC system architectures.

Resource Links: