When it comes to drug discovery, research is everything. The ability to accelerate research can make or break a business and, of course, save lives. However, key design considerations in HPC infrastructure design can make a dramatic difference in performance and maximize your cluster’s value.
How do you maximize HPC/AI performance?
We recommend you start by using commodity off the shelf (COTS) components and best-of-breed technologies from trusted partners. This lets you get more nodes for your money. This also avoids the problem of vendor lock-in, and the associated risk of being trapped into higher prices without the corresponding improvement in performance. This, in turn, improves return on investment, increases flexibility, and delivers better performance per dollar.
Using composable disaggregated infrastructure (CDI) can also significantly benefit performance. In short, CDI refers to the use of software to pool hardware resources and so they can be dynamically combined to meet shifting workload needs. As a result, you get the ability to run diverse projects on a cluster while still optimizing for each unique workload.
You should also ensure your cluster is designed to easily scale as needed with your evolving workload. By leveraging CDI you can dynamically scale individual resource pools like GPUs, accelerators, or storage. You can seamlessly add resources to your cluster and pull them under management. You should also design for linear scalability so that cluster performance and storage reliability improve with each node that you add to your cluster.
Even if you don’t use CDI, though, if your cluster designers used modular design, you should be able to scale the compute, storage, and networking components as needed. Read more about the benefits of modular HPC/AI design in this white paper about the Atlas AI Cluster.
How can you improve HPC/AI data management and ROI?
Traditional NAS solutions can run up to 10Gb/s before they create bottlenecks, which leads to exceedingly long epoch times for drug discovery workloads. You can use fast local NVMe storage as a temporary fix, but this means data is constantly being copied between nodes. This leads to high network traffic, which can be another bottleneck.
Software defined storage (SDS) is a great solution to data management. By pairing commodity servers with SDS, you can improve IO bandwidth by up to 10x over traditional mass. This solution also eliminates constant node to node copying, by creating a single namespace data lake that all compute nodes can read or write to. You can also reduce the total cost of ownership by tiering S3 compliant object storage on premise or in public cloud.
How can you improve HPC/AI efficiency?
Purpose-built clusters can really optimize drug discovery time-to-result. Consider CDI, as it enables your workforce to perform without hitting bottlenecks, and allows you to scale seamlessly, while maintaining high-level security.
Bare metal performance (or its equivalent with CDI) is ideal. If you are thinking about deploying an AI solution to the cloud, though, ideally, you want them in the same data center. If you have a hybrid model, the resources that incur the greatest costs should be on premises.
You can also improve efficiency through networking by using NVMe and robust (10 GbE) networking technology. We recommend NVIDIA/Mellanox 200Gb/s HDR InfiniBand. Your storage should also be highly parallel and support large single namespace.
To learn more about HPC/AI for drug discovery, watch our on-demand webinar.