MailChimp Developer

Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Interview: Managing HLRN Cray Cascade Systems with Moab

This week Adaptive Computing announced that the HLRN Consortium in Germany will be using Moab to manage its new Cray Cascade supercomputers. To learn more I caught up with Wolfgang Dreyer, Lee Carter, and Chad Harrington from Adaptive.

insideHPC: This win at HLRN in Germany is part of a ongoing relationship. How long has the HRLN consortium been using Moab?

Wolfgang Dreyer

Wolfgang Dreyer: HLRN is a long time customer of Adaptive Computing. HLRN was looking during the tender to available competitive solutions but having experience with Moab and Cray offering the Moab solution as part of their offering made a strong partnership a winning team-play.

Lee Carter

Lee Carter: HLRN first became an Adaptive (Moab) customer back in 2008 when they purchased their existing SGI environment – the system the new Cray hardware will be replacing.

insideHPC: What do you think makes Moab the resource management tool of choice for HLRN?

Wolfgang Dreyer: HLRN is using a wide variety of Moab modules already. Grid functionality and Accounting Manager are two of them as well as HLRN is keen to use and adopt the newly developed Power functionality on an optimized CRAY environment.

Lee Carter: In addition, maximum, business-aligned utilization of their systems is important. Our alignment of budget allocations to utilization using Moab Accounting Manager is key. Power-aware workload management is very much an additional value-add capability we will be jointly exploring and weaving into their day-to-day operations in collaboration with Cray and HLRN going forward.

insideHPC: The Cray “Cascade” systems at HLRN represent the state-of-the-art in clustering technology from the company and even features its own version of Linux. How closely do you work with Cray to ensure that Moab can optimize management of HPC resources?

Chad Harrington

Chad Harrington: Adaptive has long worked closely with Cray, since we have many common customers. Cray makes many of the world’s largest systems and Moab is particularly well suited for very large systems and workloads. As a result, Adaptive and Cray work together to ensure that Moab can take best advantage of Cray’s unique architecture and capabilities.

Lee Carter: Cray Cascade systems have a special interconnect technology invented by Cray. Moab is aware of this interconnect structure and can place Jobs depending on the JOB Specification on specific blades taking interconnect hops to account or depending on cache and memory availability. These are only a small fraction of parameters Moab can handle on Cray systems.

insideHPC: HLRN does a wide variety of research spanning from bio-informatics, chemistry, climate and ocean modeling, engineering, environmental research, and fluid dynamics to physics. With such a diverse workload, how do you ensure that the systems don’t get bogged down and are kept busy?

Wolfgang Dryer: Moab HPC Suite, Enterprise Edition has features that have long since been used by HLRN. One feature is Grid Option, which helps to ensure that both clusters are load balanced. Load Balancing two remote locations has special challenges as you must take into account time delays and the communication with an independent cluster having it´s own job responsibilities that change each time.

The second feature is the accounting manager which is integrated into Enterprise Edition. This can administrate accounts for different research groups, which get “Fair share usage” of the cluster. Fair-Share can be based on money, compute time or other parameters available in accounting manager.

Moab policies ensure that the cluster is always used in optimized utilization even if a group of researchers do not have jobs to run at a certain time. In this case Jobs with low priority can run even when they would normally run later (also known as backfill). The policy engine ensures that backfill jobs get low priority or suspended when a new high priority job is expected to run.

Resource Links: