Article explores answers for challenges faced by virtualization in HPC

Twitter user LeahRosin tweeted a link to this article yesterday at, “Virtualization for high-performance computing on Linux: Critiques and challenges”

In my tip Using virtualization to reinvent high-performance computing on Linux, I called out a range of applications and benefits that virtualization can bring to high-performance computing (HPC). But the question is, despite these compelling cases, why don’t we see more pervasive use of virtualization in HPC?

Well, you may have heard this: There ain’t no such thing as a free lunch. In some cases, virtualization technology may not (yet) meet legacy HPC requirements; in others, HPC systems providers and deployers are not comfortable departing from familiar (and expensive) technology acquisition paths and roadmaps. This tip outlines a number of perceived roadblocks to leveraging virtualization and explains how virtualization can actually be a very good fit for HPC.

The article is in a “challenge — solution” format, noting critiques of virtualization in HPC (jitter/latency, paravirtualization, HPC acquistion, and so on) and providing a mitigating point of view. It reads pro virtualization to me. I’m not a sys admin or cluster designer, so I’d be interested to know your reaction to this article.


  1. No offense to this guy, but he is obviously not working in HPC, nor truly understands, or at least comes across as not understanding, what HPC really means. You can tell this with the way he throws buzzwords around that make no sense what so ever in the realm of HPC.

    I love this sentence.

    “Moreover, HPC deployers seldom considered COTS-based acquisition paths and technologies like virtualization because space and time multiplexing provided by hardware virtualization provides no short-term benefit to HPC users.”

    If ya can’t convince em with the truth, dazzle em with bullshxx.

    There are places within HPC where virtualization makes a lot of sense, at least to me. IBM has done a wonderful job of it with their pSeries products. It’s there when you want it, and out of the way when you don’t.

    Putting Virtualization on COTS servers used in an HPC cluster just doesn’t make much sense. Webservers, sure, ldap, sendmail, etc. Sure. Compute nodes? Um, no.

    I started to reread the article to cherry pick out funny and obfuscated bull, but the more I reviewed the article the more I realized I would just be preaching to the choir as it were. I really don’t think this author would differentiate between web farms and HPC.

    Just my two cents worth.

    Richard Hickey

  2. The article isn’t that bad, yes there is a bit too much jargon in there as noted in a previous comment however there are fundamental problems with virtualization in HPC environments, for example if you have a COTS cluster with infiniband and you are running a MPI application you are probably getting decent performance from your cluster including good bandwidth and low latency….so what happens when you try to run hypervisors on your compute nodes and run your MPI app? well, basically it sucks, the performance sucks the virtualization software stack wasn’t designed to handle MPI over infiniband so it adds a lot of overhead with the result that the apps do not run well….Now if only someone would fix this problem then it would be interesting.

  3. I tend to agree with the first commenter — there are so many
    buzzwords in that article that it is quite hard to pin down exactly
    what the author is trying to say. I had difficulty trying to parse
    exactly what each “solution” *really* meant. Most of them are quite
    vague and very high level, and (for me, at least) lead to further

    Don’t get me wrong, I’m sure that there are a few nuggets of Goodness
    that you could tease from virtualization in HPC. For example, the
    age-old “you can have a custom OS for every job” sounds quite
    compelling. Another touted benefit is automatic checkpoint/restart
    and process migration.

    However, I contend that these technologies already exist (at least in
    some form — I wouldn’t say that these are “solved” problems yet) in
    dramatically less complex forms. Virtualization adds no tangible
    benefit for these that I can see, while adding many more software
    and hardware layers that increase the complexity of the issue.

    Instead of promoting a specific technology and all the features that
    *may* be useful, I prefer to look at each feature and see if a) users
    want it, and b) what back-end technology(ies) is(are) candidates to
    provide that feature. Specifically: I don’t espouse the “I have a
    hammer, so everything looks like a nail” approach — I espouse the “I
    have a task to accomplish; what kind of tool should I use?” approach.

    Here’s a few points in no particular order:

    1. Simple (and battle-proven) tools for changing the OS on a node
    already exist. Systems like Perceus (and others) can load a new OS on
    the fly with existing, well-proven technologies. There’s a little
    work to do to tie them into the scheduler (e.g., reserve some nodes,
    reboot them to your favorite OS, then actually launch your job), but
    it’s very do-able and significantly less complicated than, for
    example, leveraging OS-bypass networks through virtualization (as
    noted by the second commenter). It should be noted that a similar
    amount of work will be needed to make a virtualized solution swap the
    OS on a per-job basis.

    2. #1 not withstanding, I haven’t heard many real-world HPC users ask
    for the ability to change the OS just for their job. My experience
    has been that users have enough difficulty compiling and linking their
    massively complex applications (sometimes with dozens of underlying
    library requirements) …and now you want vary the OS as well? I’m
    actually not trying to be snarky here, but building and running HPC
    apps in a homogeneous environment is just *hard*. Adding *more*
    software layers to the mix will make it *harder*. Indeed, the HPC
    industry and research communities are actively trying to *lower* the
    HPC’s barrier to entry by simplifying as much as possible. Varying
    the OS sounds (to me) like a complicating factor that will run into no
    end of user-level confusion and problems.

    3. There are still many open issues surrounding virtualizing OS-bypass
    networks. The issues are extremely complex, probably requiring
    hardware-based solutions to maintain reasonable levels of performance.
    That being said, virtualizing TCP isn’t hard and is a fairly
    well-solved issue, but usually at the cost of either specialized
    network hardware or more CPU cycles if done completely in software
    (i.e., more $$$ or less performance).

    4. The article talked about swapping out the OS to run an
    HPC-optimized OS. But most HPC clusters already run HPC-optimized
    OS’s anyway. The system administrators have already carefully tuned
    the OS that is loaded onto their compute nodes to allow their users’
    apps to run with every available CPU cycle.

    5. One of the hidden costs of virtualization is the (potentially
    large) support burden for the HPC system administrators. With
    virtualization, the sysadmins will have to create VMs of a variety of
    different OSs, but that each support — at a bare minimum — the basic
    infrastructure of their cluster (e.g., the batch scheduler/resource
    manager, the network stack, compilers, custom support
    middleware/libraries, MPI installations, etc.). And then keep all of
    those VMs up to date every time there’s a software update. It’s
    already a very difficult task today to have *one* coherent set of
    compute nodes with all the same versions of all the same tools that
    the users expect; with virtualization, there will now need to be
    multiple coherent sets of (virtual) machines. Is it *possible*?
    Sure. Is it desirable? That remains to be seen. Is it cheap? Most
    likely not. Finding good HPC sysadmins is already a difficult task;
    adding the virtualization skillset into HPC sysadmin requirements will
    make the job even tougher.

    6. Let’s also not forget that there are hardware infrastructure issues
    associated with deploying VMs, particularly for large-scale parallel
    jobs (storage and I/O transport systems). Imagine deploying a
    specific VM to thousands of nodes just to run a single job. Or
    perhaps you pre-stage all possible VMs to every node (there are
    several different strategies possible). Sure, it can be done, and it
    can likely even be done efficiently. Some HPC clusters may already
    have dedicated I/O infrastructure to handle large data movement. But
    it’s additional cost and infrastructure for others.

    7. On paper, the automatic checkpoint/restart and process migration of
    virtualization all sounds great. However, in practice, you either
    have to involve the MPI (or whatever communications middleware you’re
    using) or abstract the communications stack away and hide all of it in
    the OS/network stack. *And* hide all the migration issues, which, by
    definition, means that you have to virtualize the network interface
    and translate virtual to physical location in real time. This becomes
    especially tricky in OS-bypass network stacks such as MX or
    OpenFabrics. Throwing my trump card of being an MPI implementer: why
    add soooo many more software layers when MPI implementations have [at
    least partially] solved the issue already? Several MPI
    implementations are capable of at least some flavor of
    checkpoint/restart and migration. I contend that adding *more* layers
    to the mix both has negligible benefit and needlessly makes the
    overall system more complex (and harder to develop, debug, deploy, and

    All of this being said, I’m not an HPC cluster admin. I talk to a
    fair number of users, but I’m not a production HPC cluster support guy
    – and my biases are a bit different. So my views above may be a bit
    skewed. Perhaps a survey of real world users would help to quantify
    this issue: are the touted benefits/features of virtualization needed
    by a lot of HPC users? Why not organize a survey and actually ask

    As usual, this is all my $0.02. Take it for what it’s worth. :-)

  4. John West says:

    I want to thank you guys for the thoughtful, detailed comments. I appreciate the time and effort that takes…survey, eh? Not a bad idea. I’ll see if I can work one up.

Resource Links: