JARVICE™ XE, The Platform Powering Hybrid Supercomputing

What’s new for 2022

Since its debut in 2019 as the world’s first container-native, hybrid cloud supercomputing application delivery platform, JARVICE™ XE has seen rapid innovation and increased customer adoption.  As Nimbix became an Atos company in 2021, the JARVICE™ XE platform expanded its footprint across the industry’s broadest range of high-performance computing solutions – powering any combination of the fastest supercomputers in the world to public, private, and hybrid clouds, all while delivering the most intuitive and comprehensive user experience for scientists, engineers, developers, and independent software vendors (ISVs).

2022 welcomes major new features to improve utilization and enable converged HPC and AI on novel infrastructures.

Advanced License-based Queuing

Strategies to maximize software licenses for solvers are imperative to any successful HPC deployment.  License costs are among the highest budgetary line items and must be managed effectively to ensure organizations are getting the best value possible.  This includes driving maximum utilization when needed, while ensuring users get the appropriate allocation proportional with the priority of their projects at any given time.

New for 2022, JARVICE™ XE [1] now supports “preemptible features” in its ISV license management, automatically suspending solver license use to make room for higher priority jobs based on project prioritization.  The feature also allows defining minimum allocation for projects regardless of their priority, ensuring “fair share” of licenses under full utilization – such as circuit simulations during “tape out,” for example.  What’s more, like all mechanisms in JARVICE™ XE, advanced license-based queuing works across an entire deployment topology, including multi-site, multi-cloud and hybrid.  Organizations can manage their software licenses centrally regardless of where users sit, or where compute takes place – even in federated deployments where data sets are separate.

Additional features include the ability to adjust and reconfigure project allocations dynamically, inspect utilization metrics in real-time, and combine multiple license servers for failover and high availability.

Converged HPC and AI

Connecting and managing specialized infrastructure to a heterogeneous HPC environment is key to unlocking the next generation of applications – breaking through the confines of “the end of Moore’s Law.”

New in JARVICE™ XE is the ability to schedule jobs on Atos ThinkAI systems [2] leveraging Graphcore IPUs (Intelligence Processing Units), for both training at scale and inference. Rather than having to manage a separate system, this mechanism is available to the same JARVICE™ XE deployment controlling any combination of CPU-only or GPU-enabled HPC nodes.  With this capability, customers can enable the right computational accelerators for each workload while maintaining a consistent user experience. This may range from a single IPU to dozens of IPUs per job, as well as multi-node, and scale-out AI training on the latest NVIDIA GPUs.

In all cases, JARVICE™ XE’s enterprise access controls, security policies, and resource allocation mechanisms apply to teams, projects, and users leveraging IPUs.  Customers can either run “out of the box” AI applications available on the HyperHub™ marketplace [3] or deploy their own custom codes based on popular frameworks such as TensorFlow.

A Single Control Plane for Global HPC

New capabilities in JARVICE™ XE include integrating traditional HPC applications and systems, such as schedulers and application clients, into a single global control plane.  Organizations may define a processing topology with any combination of bare-metal supercomputers, public cloud endpoints, “regional/sovereign” cloud resources, and other “HPC-as-a-service” mechanisms.  JARVICE™ XE’s intuitive interface delivers the best user experience for both end-users and operators from a “single pane of glass,” seamlessly integrating with storage, datasets, identity services, and application sources.  Customers may choose to either manage this topology themselves or leverage Atos cloud and HPC expertise for a truly concierge-level experience.

As advanced computing needs evolve, an organization’s global HPC footprint, powered by JARVICE™ XE, can adapt seamlessly and without disruption.  Literally, overnight, users can enjoy new capacity, new capabilities, and new applications to meet the needs of their most demanding projects.  The fully dynamic nature of JARVICE™ XE affords expansion and reconfiguration at any time, without ever having to go back to the “drawing board.”  Customers can choose the right mix of dedicated and on-demand resource to fit budget and production needs without burdening end-users with making infrastructure choices just to get their work done.  The platform seamlessly routes users to the right compute resources based on geography, data, workflow, and policies.

When it comes time to analyze spend, JARVICE™ XE provides fully granular accounting information to operators so they can make informed decisions.  This data is available by zone, resource, team, project, user, and application, across any period of time, down to each job.  Operators can analyze whether users are requesting too much or too little resource for various workflows, waiting too long in queues, or simply not running enough jobs to merit their resource allocation.

Accounting information in JARVICE™ XE is available either via web portal or API, allowing customers to download the raw data into their own analysis platforms as needed.  Whether determining appropriate hardware budgets or identifying underutilized zones and resources, customers can rest easy knowing they are using complete, authoritative data to drive these decisions.

The Future of HPC-as-a-Service

The possibilities for global HPC-as- a-Service [4] managed with a “single pane of glass” are endless, but we are only at the beginning.  In the coming year, JARVICE™ XE will continue to evolve to improve the user experience and deliver more capabilities than were ever thought possible.  Significant areas of innovation include, but are not limited to, global data management, greater ease of use, and more advanced budget controls.

Whether leading a digital transformation in manufacturing, discovering “the cure,” finding new sources of energy, or improving the interaction between humans and machines, JARVICE™ XE takes care of managing advanced computing infrastructure while freeing up users to innovate in ways they didn’t imagine were possible. Stay tuned for more exciting announcements from Atos and visit us at ISC 2022, May 30th – June 1st, 2022.

About the Author

Leo Reiter, CTO and Technical Director of the Atos HPC Cloud Competency Center – Nimbix, an Atos company.

 

 

[1] https://www.nimbix.net/jarvicexe

[2] https://atos.net/en/solutions/high-performance-computing-hpc/thinkai

[3] https://www.nimbix.net/hyperhub-application-marketplace

[4] https://atos.net/en/solutions/high-performance-computing-hpc/hpc-as-a-service