Article explores answers for challenges faced by virtualization in HPC

March 5, 2009 by Doug Black

Twitter user LeahRosin tweeted a link to this article yesterday at SearchEnterpriseLinux.com, “Virtualization for high-performance computing on Linux: Critiques and challenges”

In my tip Using virtualization to reinvent high-performance computing on Linux, I called out a range of applications and benefits that virtualization can bring to high-performance computing (HPC). But the question is, despite these compelling cases, why don’t we see more pervasive use of virtualization in HPC?

Well, you may have heard this: There ain’t no such thing as a free lunch. In some cases, virtualization technology may not (yet) meet legacy HPC requirements; in others, HPC systems providers and deployers are not comfortable departing from familiar (and expensive) technology acquisition paths and roadmaps. This tip outlines a number of perceived roadblocks to leveraging virtualization and explains how virtualization can actually be a very good fit for HPC.

The article is in a “challenge — solution” format, noting critiques of virtualization in HPC (jitter/latency, paravirtualization, HPC acquistion, and so on) and providing a mitigating point of view. It reads pro virtualization to me. I’m not a sys admin or cluster designer, so I’d be interested to know your reaction to this article.

Comments

Richard Hickey says

March 5, 2009 at 12:52 pm

No offense to this guy, but he is obviously not working in HPC, nor truly understands, or at least comes across as not understanding, what HPC really means. You can tell this with the way he throws buzzwords around that make no sense what so ever in the realm of HPC.

I love this sentence.

“Moreover, HPC deployers seldom considered COTS-based acquisition paths and technologies like virtualization because space and time multiplexing provided by hardware virtualization provides no short-term benefit to HPC users.”

If ya can’t convince em with the truth, dazzle em with bullshxx.

There are places within HPC where virtualization makes a lot of sense, at least to me. IBM has done a wonderful job of it with their pSeries products. It’s there when you want it, and out of the way when you don’t.

Putting Virtualization on COTS servers used in an HPC cluster just doesn’t make much sense. Webservers, sure, ldap, sendmail, etc. Sure. Compute nodes? Um, no.

I started to reread the article to cherry pick out funny and obfuscated bull, but the more I reviewed the article the more I realized I would just be preaching to the choir as it were. I really don’t think this author would differentiate between web farms and HPC.

Just my two cents worth.

Richard Hickey
Bill Bryce says

March 6, 2009 at 5:34 pm

The article isn’t that bad, yes there is a bit too much jargon in there as noted in a previous comment however there are fundamental problems with virtualization in HPC environments, for example if you have a COTS cluster with infiniband and you are running a MPI application you are probably getting decent performance from your cluster including good bandwidth and low latency….so what happens when you try to run hypervisors on your compute nodes and run your MPI app? well, basically it sucks, the performance sucks the virtualization software stack wasn’t designed to handle MPI over infiniband so it adds a lot of overhead with the result that the apps do not run well….Now if only someone would fix this problem then it would be interesting.
Jeff Squyres says

March 7, 2009 at 8:16 am

I tend to agree with the first commenter — there are so many
buzzwords in that article that it is quite hard to pin down exactly
what the author is trying to say. I had difficulty trying to parse
exactly what each “solution” *really* meant. Most of them are quite
vague and very high level, and (for me, at least) lead to further
arguments.

Don’t get me wrong, I’m sure that there are a few nuggets of Goodness
that you could tease from virtualization in HPC. For example, the
age-old “you can have a custom OS for every job” sounds quite
compelling. Another touted benefit is automatic checkpoint/restart
and process migration.

However, I contend that these technologies already exist (at least in
some form — I wouldn’t say that these are “solved” problems yet) in
dramatically less complex forms. Virtualization adds no tangible
benefit for these that I can see, while adding many more software
and hardware layers that increase the complexity of the issue.

Instead of promoting a specific technology and all the features that
*may* be useful, I prefer to look at each feature and see if a) users
want it, and b) what back-end technology(ies) is(are) candidates to
provide that feature. Specifically: I don’t espouse the “I have a
hammer, so everything looks like a nail” approach — I espouse the “I
have a task to accomplish; what kind of tool should I use?” approach.

Here’s a few points in no particular order:

1. Simple (and battle-proven) tools for changing the OS on a node
already exist. Systems like Perceus (and others) can load a new OS on
the fly with existing, well-proven technologies. There’s a little
work to do to tie them into the scheduler (e.g., reserve some nodes,
reboot them to your favorite OS, then actually launch your job), but
it’s very do-able and significantly less complicated than, for
example, leveraging OS-bypass networks through virtualization (as
noted by the second commenter). It should be noted that a similar
amount of work will be needed to make a virtualized solution swap the
OS on a per-job basis.

2. #1 not withstanding, I haven’t heard many real-world HPC users ask
for the ability to change the OS just for their job. My experience
has been that users have enough difficulty compiling and linking their
massively complex applications (sometimes with dozens of underlying
library requirements) …and now you want vary the OS as well? I’m
actually not trying to be snarky here, but building and running HPC
apps in a homogeneous environment is just *hard*. Adding *more*
software layers to the mix will make it *harder*. Indeed, the HPC
industry and research communities are actively trying to *lower* the
HPC’s barrier to entry by simplifying as much as possible. Varying
the OS sounds (to me) like a complicating factor that will run into no
end of user-level confusion and problems.

3. There are still many open issues surrounding virtualizing OS-bypass
networks. The issues are extremely complex, probably requiring
hardware-based solutions to maintain reasonable levels of performance.
That being said, virtualizing TCP isn’t hard and is a fairly
well-solved issue, but usually at the cost of either specialized
network hardware or more CPU cycles if done completely in software
(i.e., more $$$ or less performance).

4. The article talked about swapping out the OS to run an
HPC-optimized OS. But most HPC clusters already run HPC-optimized
OS’s anyway. The system administrators have already carefully tuned
the OS that is loaded onto their compute nodes to allow their users’
apps to run with every available CPU cycle.

5. One of the hidden costs of virtualization is the (potentially
large) support burden for the HPC system administrators. With
virtualization, the sysadmins will have to create VMs of a variety of
different OSs, but that each support — at a bare minimum — the basic
infrastructure of their cluster (e.g., the batch scheduler/resource
manager, the network stack, compilers, custom support
middleware/libraries, MPI installations, etc.). And then keep all of
those VMs up to date every time there’s a software update. It’s
already a very difficult task today to have *one* coherent set of
compute nodes with all the same versions of all the same tools that
the users expect; with virtualization, there will now need to be
multiple coherent sets of (virtual) machines. Is it *possible*?
Sure. Is it desirable? That remains to be seen. Is it cheap? Most
likely not. Finding good HPC sysadmins is already a difficult task;
adding the virtualization skillset into HPC sysadmin requirements will
make the job even tougher.

6. Let’s also not forget that there are hardware infrastructure issues
associated with deploying VMs, particularly for large-scale parallel
jobs (storage and I/O transport systems). Imagine deploying a
specific VM to thousands of nodes just to run a single job. Or
perhaps you pre-stage all possible VMs to every node (there are
several different strategies possible). Sure, it can be done, and it
can likely even be done efficiently. Some HPC clusters may already
have dedicated I/O infrastructure to handle large data movement. But
it’s additional cost and infrastructure for others.

7. On paper, the automatic checkpoint/restart and process migration of
virtualization all sounds great. However, in practice, you either
have to involve the MPI (or whatever communications middleware you’re
using) or abstract the communications stack away and hide all of it in
the OS/network stack. *And* hide all the migration issues, which, by
definition, means that you have to virtualize the network interface
and translate virtual to physical location in real time. This becomes
especially tricky in OS-bypass network stacks such as MX or
OpenFabrics. Throwing my trump card of being an MPI implementer: why
add soooo many more software layers when MPI implementations have [at
least partially] solved the issue already? Several MPI
implementations are capable of at least some flavor of
checkpoint/restart and migration. I contend that adding *more* layers
to the mix both has negligible benefit and needlessly makes the
overall system more complex (and harder to develop, debug, deploy, and
maintain).

All of this being said, I’m not an HPC cluster admin. I talk to a
fair number of users, but I’m not a production HPC cluster support guy
— and my biases are a bit different. So my views above may be a bit
skewed. Perhaps a survey of real world users would help to quantify
this issue: are the touted benefits/features of virtualization needed
by a lot of HPC users? Why not organize a survey and actually ask
people?

As usual, this is all my $0.02. Take it for what it’s worth. 🙂
John West says

March 7, 2009 at 11:12 pm

I want to thank you guys for the thoughtful, detailed comments. I appreciate the time and effort that takes…survey, eh? Not a bad idea. I’ll see if I can work one up.

Article explores answers for challenges faced by virtualization in HPC

Sponsored Guest Articles

Dell: Omnia Copes with Configuring HPC-AI Environments

White Papers

Energy efficiency drives HPC to the cloud

Comments

Featured RSS Feed

More News from insideBIGDATA

Article explores answers for challenges faced by virtualization in HPC

Sponsored Guest Articles

Dell: Omnia Copes with Configuring HPC-AI Environments

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Comments

Related Posts

Featured RSS Feed

More News from insideBIGDATA