NVMe over Fabrics and GPU Direct Storage Boost HPC and AI Edge Applications

In this special guest feature, Tim Miller, VP of Product Marketing at One Stop Systems (OSS), discuses how deploying edge HPC solutions – instead of data movement over relatively slow or unsecure networks to distant datacenters – provides significant benefits in cost, responsiveness and security.

Edge HPC applications generate and process massive amounts of data.  In many such applications, the problem of data IO bottlenecks is becoming as challenging as large scale data processing itself.  Edge HPC applications are proliferating because the need for real time decision making cannot be supported using traditional centralized datacenter or cloud computing models. Deploying edge HPC solutions – instead of data movement over relatively slow or unsecure networks to distant datacenters – provides significant benefits in cost, responsiveness and security. Real time decisions require sourcing and storing raw data, and converting it to actionable intelligence with high speed computing in the field close to the data source.

Applications include varied use cases like autonomous vehicle development cars, long haul trucks and delivery vans; intelligent video analytics for in-field industrial surveillance, security, operation and maintenance; and sensor data analytics for air, sea and land based mobile defense systems for threat detection, mitigation, command and control. The common elements in all of these applications are high bandwidth data acquisition from a wide range of sensors and cameras, storage subsystems for collecting and storing these streams of data and high speed compute engines which perform the algorithms and inferencing necessary to drive real time actions.

“In many cases, these applications address the compute requirements through the use of multi-GPUs or specialized compute processors that provide parallel computing where very large data sets can be analyzed simultaneously.  These compute engines require large scale IO bandwidth to ensure the compute engines have the data available when they are ready to process.  With the ever advancing capability of GPU subsystems, the bottleneck for many of these applications has moved to the storage subsystem requiring new innovations in data IO architectures.”

There are a number of technologies and innovations which address this data IO challenge.  The first is the adoption of PCIe Gen4 as the fundamental system interconnect within the storage subsystem.  PCIe Gen4 doubles the internal bandwidth over Gen3, delivering 16GT/s (gigatransfers per second) per lane resulting in 64GB/sec bandwidth in a standard 16 lane link (duplex) between PCIe components including the host processor, storage drives, network interfaces and compute accelerators.  The next critical technology is NVMe SSDs, which directly utilize the high bandwidth, low latency PCIe for enhanced data movement directly to and from the solid state storage. Typically with x4 PCIe connections, each drive is connected at 16GBps (full duplex) bandwidth at Gen 4 speeds.  The NVMe protocol also eliminates the overhead associated with legacy storage protocols.

However, even with PCIe Gen 4 and the latest NVMe SSD drives, the performance of the storage subsystem can still be limited by traditional data movement architectures. Data movement technologies are now available to address these fundamental problems including NVMe over Fabric (NVMe-oF) and GPU Direct Storage.

Figure 1: Disaggregated modular HPC Edge solutions with OSS AI on Fly building blocks including PCIe Gen 4 Expansion Optimized Servers and Edge Optimized Storage and GPU Expansion platforms.

The first limitation is confronted when including external storage either to increase capacity beyond direct attached storage or for support of share storage functionality.  The introduction of NVMe-oF is critical in addressing the potential performance impact of this external storage. NVMe-oF is a protocol specification designed to connect compute processors to storage across a network fabric using the NVMe protocol.  NVMe-oF enables powerful disaggregated architectures where the performance of external storage is as high as internal storage with minimal latency impact.

A final limitation and cause of IO performance bottlenecks is the traditional data movement paths via the host processor and memory. When moving data from the network to storage or from the storage to the compute elements, it is first copied to system memory. This negatively impacts not only bandwidth but latency and CPU utilization as well.  The solution is to leverage direct memory access (DMA) technology to bypass the host CPU using both GPU Direct and GPU Direct Storage.  Support for these technologies allows data to move directly between the network interfaces, storage and compute engines without passing through the host CPU.  The combination of these technologies will allow HPC Edge solutions to continue to scale with ever increasing processing performance balanced with ever increasing data IO capability. OSS has developed its AI on the Fly® portfolio of building block products to address HPC edge and is now incorporating all of these data movement optimizations.  AI on the Fly building blocks utilize the latest high-performance technology including CPUs, GPUs, NVMe storage, and high-speed data acquisition technology all interconnected with PCIe Gen 4.  Now, OSS is adding through its Ion Accelerator® Software stack support for NVMe-oF, GPU Direct and GPU Direct Storage supporting disaggregated flexible modular edge HPC solutions. OSS’ AI on the Fly solutions target deployment outside traditional datacenter environments at the edge.  These are often harsh and rugged environments and, in many cases, solutions must meet unique criteria for shock and vibration, humidity, altitude, and large operating temperature ranges. For more information about OSS’ AI on the Fly® PCIe Gen4 product portfolio, visit: www.onestopsystems.com.

Disclaimer: This article may contain forward-looking statements based on One Stop Systems’ current expectations and assumptions regarding the company’s business and the performance of its products, the economy and other future conditions and forecasts of future events, circumstances, and results. Ion Accelerator® is a registered trademark used under license by One Stop Systems.

About the Author

Tim Miller is Vice President of Product Marketing at One Stop Systems. Tim has over 33 years of experience in high tech operations, management, marketing, business development, and sales. He previously was the CEO of Dolphin Interconnect Solutions and CEO and founder of StarGen, Inc. Tim holds a Bachelor of Science in Engineering from Cornell University, a Masters of Business Administration from Wharton, and a Masters in Computer Science from the University of Pennsylvania.

Disclaimer: This article may contain forward-looking statements based on One Stop Systems, Inc.’s current expectations and assumptions regarding the company’s business and the performance of its products, the economy, and other future conditions and forecasts of future events, circumstances and results.