Altair PBS Works Steps Up to Exascale and the Cloud

Print Friendly, PDF & Email

Sam Mahalingam is CTO of Altair.

In this video from SC19, Sam Mahalingam from Altair describes how the company is enhancing PBS Works software to ease the migration of HPC workloads to the Cloud. Altair’s cloud computing solutions deliver up-to-the-minute technology and access to the resources you need to solve problems and make world-changing discoveries.

PBS Works runs big — 50,000 nodes in one cluster, 10,000,000 jobs in a queue, and 1,000 concurrent active users. It runs fast — 10,000,000 jobs per hour end-to-end throughput and 10-second end-to-end run for a single 4,000+ node job. It recovers from faults without losing work and is ICD 503-certified with Red Hat Enterprise Linux MLS. Some of the biggest computing systems in the world use Altair PBS Works, including those leading the race to exascale, and we’re helping shape the US exascale ecosystem as a member of the DOE’s Exascale Computing Project (ECP) Industry Council.”

Argonne National Laboratory has teamed with Altair to implement a new scheduling system that will be employed on the Aurora supercomputer, slated for delivery in 2021.

Aurora will be one of the nation’s first exascale systems. Aurora will be in high demand from researchers around the world and, as a result, will need a sophisticated workload manager to sort and prioritize requested jobs. It found a natural partner in Altair to meet that need.

Argonne was initially planning an update to its own workload manager COBALT (Component-Based Lightweight Toolkit) which was developed 20 years ago within the lab’s own Mathematics and Computer Science Division. COBALT has served the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science User Facility, for years, but after careful consideration of several factors, including cost and efficiency, the laboratory determined that a collaboration with Altair on the PBS Professional open source solution was the best path forward.

When we went to talk to Altair, we were looking for a resource manager (one of the components in a workload manager) we could use,” said Bill Allcock, manager of the Advanced Integration Group at the ALCF. ​“We decided to collaborate on the entire workload manager rather than just the resource manager because our future roadmaps were well aligned.”

See our complete coverage of SC19

Check out our insideHPC Events Calendar