Josh Simons at Sun recently took a trip to ORNL to talk with their System Research Team about work that both organizations are doing with respect to virtualization in HPC. His full trip report is an interesting read.
Uses for virtualization in HPC? From Josh’s report
[Resiliency] In addition, clusters are getting larger. Much larger, even with fatter nodes. Which means more frequent hardware failures. Bad news for MPI, the world’s most brittle programming model. Certainly, some more modern programming models would be welcome, but in the meantime what can be done to keep these jobs running longer in the presence of continual hardware failures?
[Scaling] Among them, the use of multiple virtual machines per physical node to simulate a much larger cluster for demonstrating an application’s basic scaling capabilities in advance of being allowed access to a real, full-scale (and expensive) compute resource.
[System-level portability] Geoffroy also spoke about “adapting systems to applications, not applications to systems” by which he meant that virtualization allows an application user to bundle their application into a virtual machine instance with any other required software, regardless of the “supported” software environment available on a site’s compute resource.
[Observability] Observability was another simpatico area of discussion. DTrace has taken low-cost, fine-grained observability to new heights (new depths, actually). Similarly, SRT is looking at how one might add dynamic instrumentation at the hypervisor level to offer a clearer view of where overhead is occurring within a virtualized environment to promote user understanding and also offer a debugging capability for developers.
Performance addicts — gotta have that last op…GOTTA HAVE IT!! — will pooh pooh the idea of losing cycles to the VM. I can already hear them gearing up to point out my foolish carelessness with their hard-earned MADD. My own personal bias is that this point of view is irrelevant to the point of irresponsibility.
We already let between 80 and 99% of a machine’s available capability fall on the floor, and spend thousands of man hours trying to do as much as possible with the few percentage points we can get. In doing so we have totally ravaged the concept that people (with their ability to reason) are more valuable than machines. Time spent in making supercomputing more usable and more accessible, on everything from more usable programming tools to interfaces that support the end user, will bring more people into HPC, who in turn will make HPC better and use HPC to make more areas of everyday life better. This is the path to realizing the promise of HPC in the next two decades: not 10, 100, and 1,000 PF machines.
So, I think VMs (and UIs and IDEs and APIs) are relevant to the degree to which they support the creation of environments that allow users and programmers to worry more about the task they are accomplishing than the tool they use to accomplish it. 1 or 2 FLOPS be damned. There, I said it.
Oh, Josh’s post ends with a VM in HPC reading list. Check it out.