In this video from the HPC User Forum in Milwaukee, Gabriel Broner from Rescale presents: Will HPC Move to the Cloud.
“As years have passed, HPC has transitioned from unique and proprietary designs, to clusters of many dual-CPU Intel nodes. Vendors’ products are now differentiated more by packaging, density, and cooling than the uniqueness of the architecture. In parallel, cloud computing has gained momentum in the larger IT industry. Intel is now selling more processors to run in the cloud than in company-owned facilities, and cloud is starting to drive innovation and efficiencies at a rate faster than on premises.”
High performance computing has evolved on-premise. You buy a computer for a few million dollars, and you are able to run simulations to reduce your innovation time and time to market for your products. The auto manufacturer depicted in Figure 1 represents the new dilemma faced in buying such an in-house HPC system. With the workload this company has, what size system should they buy? If they buy a system that accommodates the peak workload, they may have to spend around $20M, but the system will be only 20% utilized. If they buy a $4M system, the system will be highly utilized, but large jobs cannot be run, and jobs will wait in a queue—potentially for days—before they run, delaying innovation and time to market.
Like the previous disruptions of clusters vs. monolithic systems or Linux vs. proprietary operating systems, cloud changes the status quo, takes us out of our comfort zone, and gives us a sense of lack of control. But the effect of price, the flexibility to dynamically change your system size and choose the best architecture for the job, the availability of applications, the ability to select system cost based on the needs of a particular workload, and the ability to provision and run immediately, will prove very attractive for HPC users. It may be time to start thinking about HPC in the cloud in your organization!
The nature of cloud disruption is unique. It’s not all or nothing, and you can dip your toe in the water. If you follow the traditional processes and buy another on-premise system, you will miss the advantages of cloud. Cloud gives us an opportunity to test the benefits of the future without committing to the next multi-million-dollar purchase. If you spend $100K you can start immediately, testing HPC in the cloud, accessing the latest architectures available. If the next HPC system in 3-5 years will be in the cloud or will be a hybrid system, testing it now, learning from it, and iterating will reduce risk and will enable a much smoother transition. So, in addition to thinking about cloud, I encourage you to test the future starting next week!
Gabriel Broner has been in the HPC industry for 25 years. He held roles of operating systems architect at Cray, VP & GM of HPC at SGI/HPE, head of innovation at Ericsson, GM at Microsoft. Gabriel joined Rescale as VP & GM of HPC in July 2017.
In the end, though, the problem is price. I can see HPC in the Cloud making financial since from a value perspective for small sites, but the value proposition is much less clear to me for sites with medium-to-large scale HPC deployments.
I think about it this way: Cloud systems leverage bursty workloads so you only pay for what you use – they’re great if:
1. You have uneven demand and no other way to aggregate workloads.
2. YOu don’t have the staff or physical infrastructure to maintain a physical system.
Both of these options make HPC in the Cloud appealing for small-scale users.
Larger sites that already support a good mix of HPC workloads have already addressed this however, by aggregating and scheduling workloads using batch scheduling systems. In addition, these sites frequently already have leveraged significant staffing economies of scale that come from running some datacenters and administering system. As a result, large-scale HPC systems tend to run in an equilibrium state which removes much of the pure cost/benefit financial incentives of cloud systems.
That said, there are incentives for larger sites for HPC in the Cloud; they’re just different incentives that a straight cost-based value propostiion. In particular, issues of cost *predictability* (will I have to replace a CRAC or UPS this year?) and disaster recovery are potentially big motivations for these sites. I rarely see these discussed, however.