Greg Pfister has a blog post that I think will get a reaction or two from you guys, since judging by past posts you are always interested in arguing the relative merits of virtualization when it comes to HPC. The entire post is a short but interesting read, and I commend it to your day’s reading. But, here is the gist of Greg’s primer:
What hypervisors — really, virtual machines; hypervisors are one implementation of that notion – do is more than consolidation. Consolidation is, to be sure, the killer app of virtualization; it’s what put virtualization on the map.
But hypervisors, in particular, do something else: They turn a whole system configuration into a bag of bits, a software abstraction decoupled from the hardware on which they are running. A whole system, ready to run, becomes a file. You can store it, copy it, send it somewhere else, publish it, and so on.
As Greg points out in his post, this means that you can stop your virtual machine, with job running, and move it somewhere else and start it up again. Probably too much overhead today for this to be part of preemptive scheduling, but useful for not losing jobs over maintenance events (or having to drain the queues prior).
…The traditional performance cost of virtualization is anathema to HPC, too. But that’s trending to near zero. With appropriate hardware support (Intel VT, AMD-V, similar things from others) that’s gone away for processor & memory. It’s still there for IO, but can be fixed; IBM zSeries native IO has had essentially no overhead for years. The rest of the world will have to wait for PCIe to finish finalizing its virtualization story and IO vendors to implement it in their devices; that will come about in a patchy manner, I’d predict, with high-end communication devices (like InfiniBand adapters) leading the way.
So that’s what virtualization gets you: isolation (for consolidation) and abstraction from hardware into a manipulable software object (for lots else). Also, security, which I didn’t get into here.
Not much for most HPC today, as I read it, but a lot of value for commercial.
And of course all of this is traditional, “forward” virtualization. ScaleMP’s approach to virtualization, creating one machine out of many physical servers as we wrote about here, is a whole new set of opportunities and challenges.