Architectural Considerations for AI Workloads in Technical Computing

Print Friendly, PDF & Email

This guest post explores how HPE, Intel and WekaIO are working together to solve potential I/O bottlenecks in machine learning and AI workloads. 

High performance computing (HPC) has proven to be a critical component in advancing science and technology.  Whether it is understanding the source of cancer, improving fuel efficiency, discovering new materials, exploring space or simply improving the predictability of manufacturing lines, HPC is invariably central to the process.  The speed of data insight is a function of computational power and data analysis, hence advancements in HPC infrastructure have a significant impact on the rate of scientific discovery.

For optimal performance, HPC infrastructure must be viewed as a symbiotic system which balances compute, networking and storage.  Any imbalance between these three elements will result in wasted resources — both human and hardware infrastructure.

New AI workloads and those for machine learning and analytics, as well, have placed significant burdens on traditional HPC systems, which were designed to handle large, file high bandwidth oriented workloads. The new analytic workloads often require processing millions of tiny files at very high bandwidth, and have forced the HPC industry to adopt new media types and networking architectures for storage to ensure the compute infrastructure is utilized to its maximum.

For optimal performance, HPC infrastructure must be viewed as a symbiotic system which balances compute, networking and storage.

As network speeds have increased, the performance bottleneck has moved from a bandwidth problem to latency.  WekaIO Matrix on HPE server infrastructure takes advantage of Intel’s open source Data Plane Development Kit technology (DPDK), found in all of Intel’s high performance Ethernet adapters, to improve network performance and reduce latency. Through optimizations in the network stack, Matrix is capable of servicing I/O requests across a distributed network with application latencies as low as 200 microseconds.

But to take advantage of low latency networks, the storage media has to serve up data at comparable or better latencies to ensure highest performance. Hard disk drives have been the predominant storage medium for HPC workloads since its foundation. However they choke under any latency sensitive workloads due to the rotational latency incurred during a disk sector seek. The typical read latencies for a SATA hard disk drive are around 5.56 milliseconds while Intel’s enterprise NVMe SSDs are 65 times lower at 85 microseconds.  NVMe SSDs are well positioned to service the I/O demands of low latency applications leveraging WekaIO Matrix software running DPDK over Ethernet.

The HPE Deep Learning Cookbook has demonstrated how a distributed, scalable HPC environment based on HPE server architecture, low latency DPDK enhanced networking, Intel NVMe drives and WekaIO Matrix software can process over 2x the amount of data as a local NVMe based drive across a spectrum of popular benchmarks.

For an in-depth analysis on the I/O challenges in HPC for analytics, check out the Evaluator Group White Paper.

For further information about WekaIO and the HPE Deep Learning Cookbook, see the following:

HPE WekaIO Matrix Product page

HPE Deep Learning Cookbook

WekaIO Matrix product page