When Cloud HPC Adds Up

Over at the Thinking Out Loud blog, Adam DeConinck from R Systems writes that AWS HPC clusters are definitely useful for “bursting” loads and certain classes of problems, though they still have a few problems to solve before they can replace a “traditional” cluster.

EC2 also doesn’t get the same I/O performance you can get on bare metal. This one’s a problem for lots of people, including big web sites, and it matters in HPC too. A lot of HPC installations have big parallel filesystems that stripe over many disks, like Lustre. It’d be interesting to see what you could do running Lustre on EC2, but I think using EBS as the backing strorage would make it somewhat painful. Much nicer to use big I/O nodes attached to Infiniband. But you notice how much specialized hardware we’re talking about here? Lots of big I/O nodes, a specialized network where even IP is a second-class citizen… it all makes sense if you do HPC all the time, but if you only need to run for a few months out of the year it can seem like overkill. Especially if you are in fact running embarrassingly parallel models (and really, a whole lot of them are).

