insideHPC Guide to How Expert Design Engineering and a Building Block Approach Can Give You a Perfectly Tailored AI, ML or HPC Environment – Part 2

In this insideHPC technology guide, “How Expert Design Engineering and a Building Block Approach Can Give You a Perfectly Tailored AI, ML or HPC Environment,”we will present things to consider when building a customized supercomputer-in-a-box system with the help of experts from Silicon Mechanics.

When considering a large complex system, such as a high-performance computing (HPC), supercomputer or compute cluster, you may think you only have two options—build from scratch from the ground up, or buy a pre-configured, supercomputer-in-a-box from a major technology vendor that everyone else is buying. But there is a third option that takes a best-of-both-worlds approach. This gives you “building blocks” expertly designed around network, storage and compute configurations that are balanced, but also flexible enough to provide scalability for your specific project needs.

Key Consideration #1: Scalability

Design flexibility

Whether you’re building a small, proof-of-concept project or aiming for something bigger from the start, you want to protect your investment and be sure that the system will adapt and grow as your project grows.  Design flexibility is key here, with the ability to add nodes or racks to the hardware as needed.

Investment

If you use a customized configuration, your initial investment goes further – you don’t need to spend extra  money on overhead built into similar, more well-known all-in-one systems that may not be as easy to  expand upon incrementally.

Intelligent scalability

It’s important to scale intelligently. It does you no good to have a ton of computing boxes with no ability to  feed them the data required for training the model. This approach allows you to pay for what you need and  not set yourself up for a very expensive solution that gets bottle necked on either the compute, storage, or  networking. This requires intelligent scalability.

Future growth

These are some of the reasons why the flexible Silicon Mechanics Atlas AI Cluster configuration is designed  to support future growth. With each storage node and compute node that you add, the performance of the  cluster scales linearly, and can be added seamlessly down the road. As your problem set grows, or if compute  and storage requirements change, update, or evolve, the system is designed to scale together  seamlessly.

Key Consideration #2: Storage

Scalability without limitations

Large data sets are required to deliver accurate AI results. Having this data drives incredibly large storage  demands, and managing these data sets requires a system that can quickly scale without limitations.

“AI is akin to building a rocket ship. You need a huge engine and a lot of fuel. The rocket engine is the  learning algorithms but the fuel is the huge amounts of data we can feed to these algorithms.” – Andrew Ng, “The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future”

This often means lots of compute, but it also means being able to feed that compute. Traditional Network  Attached Storage (NAS) is bandwidth limited, so AI projects need to leverage an AI-first storage solution to  effectively pull in data. Because the compute is so incredibly powerful, you need a storage solution that is  purpose built for AI training scenarios.

HPC-focused systems

High-performance computing has similar issues, but can use traditional parallel file systems that are capable  of large streaming data sets. While the two storage systems might end up looking similar physically, an HPC- focused system is more likely to use a Lustre solution, versus an AI system that might use an AI-specific  storage solution such as that provided by Weka and an S3-compliant object storage tier.

Storage tiering

Storage tiering is another area that companies need to consider with their system, since it helps ensure  minimized cost and maximized availability, performance and recovery. However, not all storage tiering is  equal. The key to tiering is to keep things as cost-effective as possible—you don’t want to suffer a  performance penalty. But keep in mind that not every system needs Ferrari-like storage.An optimized system  will help make sure your project has enough space for hot data, balancing the rest with less expensive data  storage to meet regulatory or persistence requirements as needed.

Key Consideration #3: Networking

Consider leading-edge technology that can help you get the best possible I/O for all that data. Two examples are below:

NVIDIA GPUDirect®

When moving data through an AI or ML algorithm, or training a neural network, you need the highest data  throughput possible. GPUs are able to consume data much faster than CPUs, and as GPU computing power  increases, so does the demand for IO bandwidth. NVIDIA GPUDirect® can enhance data movement and  access for NVIDIA GPUs. With GPUDirect, network adapters and storage drives can directly read and write  to/from GPU memory. This eliminates unnecessary memory copies, decreases the CPU overhead, and  reduces latency, all resulting in significant performance improvements. Through a comprehensive set of APIs, customers can access GPUDirect Storage, GPUDirect Remote Direct Memory Access (RDMA), GPUDirect Peer to Peer (P2P) and GPUDirect Video.

Connect-IB InfiniBand

Connect-IB InfiniBand adapter cards from Mellanox provide the highest performing and most scalable  interconnect solution for server and storage systems. Maximum bandwidth is delivered across PCI Express 4.0  leveraging HDR 100 or 200 Gbps InfiniBand, together with consistent low latency across all CPU cores.  They also offload CPU protocol processing and data movement from the CPU to the interconnect,  maximizing the CPU efficiency and accelerating parallel and data-intensive application performance. It  supports data operations such as noncontinuous memory transfers, which eliminate unnecessary data copy  operations and CPU overhead. Storage nodes also see improved performance with the higher bandwidth,  and standard block-and-file access protocols can leverage InfiniBand RDMA for even more performance.

Over the next few weeks we’ll explore Silicon Mechanic’s new insideHPC Guide:

Download the complete “How Expert Design Engineering and a Building Block Approach Can Give You a Perfectly Tailored AI, ML or HPC Environment,” courtesy of Silicon Mechanics.