Joe Landman comments at his blog about one of the problems with cloud computing at the high end: data in motion. He references a news release at HPCwire from a team using Google’s CluE for computational genomics work
We don’t see enough of these cost-benefit analyses when people talk about cloud computing. Sure the remote resources are there and usable. But if you spend so much time or cost to move your data … is the low cost of the computing cycle still worth it?
…external clouds are being marketed at small as well as large companies. These only make sense if you can move the data once. That is, pay the data motion cost, store it at Tsunamic’s site, or Amazon, or CRL. Then do all your operations there as well. But these models … move the data there and let it rest there … isn’t what is being pushed.
Cloud computing can work. It is effectively ASP v2.0 (if you don’t know what ASP v1.0 was, don’t worry, you aren’t missing much). Its mostly there. The one thing that is missing to make it really work, to uncork the bottle and really let the djinni out … is low cost bandwidth.
Joe’s analysis is right, and his comments got me to thinking. My take is that people aren’t careful enough about quantifying what it is they are imagining when they talk about the promise of remotely hosted computation. HPC centers have operated on a remotely hosted computation model for decades, and I think that the same cost model applies to them as applies to the cloud computing solutions of today. The new thing is that there has arisen in the marketplace a place where you can pay money to compute, rather than having to be a member of a relatively elite organization that had access to an HPC center. This translates the heretofore theoretical costs of data movement into real dollars, which makes Joe’s equation hit the pocketbook, but the issues haven’t changed.
The problem of data movement and locality is one we deal with every day with users of the DoD Mod Program’s 6 large centers (all over the country), and we have a special-purpose network interconnecting our centers and many of our users. Still, our users burn hundreds of millions of CPU hours and most (greater than 75% of 1,000 users) of them are not even in the same state with the supers they use.
Our biggest users (multi-terabyte makers) keep it all in our center — after computing they do analysis on our resources, and then file it all away on our mass storage servers — even with a (relatively) dedicated nationwide network. Low cost bandwidth will address the financial costs of data movement, and low cost high bandwidth will address the opportunity costs for mid-market HPC users. I don’t think that there will ever (approximately) be enough bandwidth for supercomputing users, though, and this is perhaps a good argument against the relevance of clouds at the very high end IF users need to keep the data or, worse, keep a copy of the answers they create local. If users can simulate, analyze, and then destroy, then commercially hosted computation could still be viable at the high end for user communities interested in production access to HPC.
It would be interesting to compare a large (production-oriented, obviously; DOE Office of Science is not the target here) federal program’s TCO for a supercomputer (power, cooling, administration, site prep, installation/deinstallation, etc.) to the cost of buying the same capability from a commercial provider. And then to factor in the added value the vendor would be able to offer (scheduled capability upgrades, SLA’s for total cycles available, and so on). I wonder how the numbers work out?