Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Your Workload Manager is Your Latest X Factor

In this week’s Sponsored Post, Ian Lumb, Solutions Architect at Univa Corp., explores the importance of an innovative workload manager to overall success. 

For as long as I’ve been around HPC, I’ve heard requirements in the guise of wishes. And when it comes to the workload-management aspect of HPC, common refrains include: We’d love to enable the share tree policy, but our scheduler can’t keep up. Quota-limited share trees would be ideal … if that were feasible on our cluster. Advance reservation is an extremely desirable policy, but it locks up our scheduler. Sound familiar? Please read on.

Ian Lumb, Solutions Architect at Univa Corp.

In fairness (sorry!) entitlement-based policies, such as share tree, can be extremely demanding as your pending job’s priority needs to be calculated during each scheduling cycle – and potentially take into account a hierarchy of projects; scale that up by the number of users and their jobs, and you can readily appreciate why some schedulers literally can’t keep up – i.e., it can take longer to calculate the prioritized shares than the interval between scheduling cycles! Because the introduction of constraints via quotas means that even more calculations need to be performed, during each scheduling cycle, it’s no wonder why quota-limited share trees are a non-starter in many cases. Analogous concerns arise with policies such as advance reservation, where there is an implicit requirement for backfill scheduling; this being the case, it’s not surprising that the pressing need for recurring advance reservations can’t even enter into serious conversations …

Analogous concerns arise with policies such as advance reservation, where there is an implicit requirement for backfill scheduling.

Unfortunately, the realities of these limitations are symptomatic of a much deeper cause – owing to the architecture and implementation of the internals of the workload manager itself. Because we have enterprise customers who consistently demand improved performance from every component in their HPC infrastructure, at Univa we’ve been well motivated to refactor and reimplement the very guts of Grid Engine’s internals. Fortunately, at Univa, we’re also uniquely positioned to take on this holistic neurosurgery, as almost every member of the team that previously made open source Grid Engine their focus has been gainfully employed by Univa since 2011.

Of course, as you’d expect, our uniquely qualified team spent considerable effort focusing on the scheduler, and the algorithms it employs to implement the share tree policy, as one notable example. However, we’ve systematically teased apart, analysed, refactored and reimplemented ancillary components of what collectively makes up Grid Engine – even the protocol used for internal communications between the master and its compute nodes. The outcome we’ve arrived at is nothing short of stunning: Univa Grid Engine bests open-source Grid Engine by a factor of 2 – and we have the data to prove it.

Univa Grid Engine bests open-source Grid Engine by a factor of 2.Click To Tweet

Although you’ll find the details in the technical brief linked to this article below, I would like to share a few, critical data points. First, the benchmark results we’ve shared are based on those kinds of policies for workload management that are actually used by our customers – i.e., multiple users with multiple jobs needing to be scheduled according to shares or reservations. Second, we’ve purposely disadvantaged the workload manager by starting it off with a workload backlog. Even in these benchmark results, you can quantitatively ‘see’ the basis for the requirements in the guise of wishes shared here at the outset, as open-source Grid Engine visibly stalls as it attempts to keep pace with the demands placed upon it. In striking contrast, our latest version of Univa Grid Engine sustains superlinear rates of job completion as submissions continue to escalate. In other words, we’ve succeeded in significantly improving the performance of Univa Grid Engine relative to its open-source ancestor.

Although 2X open-source Grid Engine is notable in and of itself, there is a broader context regarding performance that these improvements in Univa Grid Engine factor into. In our experience, customers hack InfiniBand drivers, implement novel algorithms in CUDA to exploit GPUs, overlay I/O and communication with computation, and more, all in the name of performance. Here, we’re suggesting that the latest version of Univa Grid Engine introduces a new X Factor – efficiencies that can be surprisingly gained through workload management. In other words, we wanted to share this, as you might not have thought about your workload manager as a place to look for improved performance for your HPC environment.

We understand that you may approach such findings with a healthy degree of skepticism, as this is quite a departure from what you might be accustomed to. That being the case, we invite you to start by reviewing our Technical Brief, as it provides quantitative details on points we’ve only alluded to here. Then, we hope you’ll get in touch with us, and engage us in a dialogue, as we have even more to share. Finally, we understand that the most-compelling benchmarking is that you conduct yourself; of course, we’re here to facilitate that as well. With Univa Grid Engine 8.5.0, 2X marks the spot.

Ian Lumb is a Solutions Architect at Univa Corp.

 

Resource Links: