Research team tests virtual machine monitor on 4,096 nodes of Red Storm

Physorg.com is one of many places carrying news of a recent experiment to run the Palacios virtual machine monitor on over 4,000 nodes, the largest test of the software to date.

Results show that the team successfully virtualized Red Storm using the Palacios virtual machine monitor and ran communication intensive, fine-grain parallel benchmarks of critical interest to Sandia with extremely high performance. Testing went up to 4,096 nodes, making this the largest-scale study by at least two orders of magnitude.

“Virtualizing a parallel supercomputer is particularly challenging because of the need to support extremely low latency, high-bandwidth communication among thousands of virtual machines,” Dinda says. “Supercomputing users and the owners of supercomputers will not tolerate any performance compromises because the machines are so expensive to acquire and maintain, but, on the other hand, they also want access to the benefits of virtualization.”

Virtualization in HPC is usually a topic that brings out the lovers and the haters here at insideHPC, but I think that the systems and application performance management benefits that you can get from some kind of active runtime system will lead us down this path sooner rather than later, especially with the scale problems we’ll be facing by the end of the decade. In this experiment Palacios had less than 5 percent overhead, which is clearly manageable on systems that often don’t deliver more than 3-5% of peak performance to applications anyway.

Virtualization on such a machine is important because it will allow more researchers to run scientific computing and simulation programs without reconfiguring their software to the machine’s specific hardware and software environments. In this context, thousands of virtual machines must cooperate in order to solve large problems. But because the system is extremely expensive to run, any VMM must have low overhead, which is magnified through the fine-grain interactions among the virtual machines.

More in the story at Physorg.