ClearML Announces AI Infrastructure Orchestration and Compute Management

May 17, 2024 — Open source AI platform company ClearML today announced the release of an AI orchestration and compute management capabilities, making it the first AI platform to support Kubernetes, Slurm, PBS and bare metal for seamless orchestration of AI and machine learning workloads. ClearML now offers the broadest support for AI and HPC workloads in the marketplace, according to the company.
The company’s newly released functionality enables AI practitioners to automate manual or repetitive tasks as well as offers broadened AI infrastructure management capabilities to include computing clusters using Simple Linux Utility for Resource Management (SLURM) or Altair PBS. The inclusion of Slurm and PBS, popular open-source workload managers commonly utilized in high-performance computing (HPC) environments, further enhances ClearML’s offerings that supports all popular Kubernetes variants as well as bare metal.
“Customer AI deployments are entering a new era of complexity, spanning across diverse environments such as cloud, edge, and on-premises data centers,” said Moses Guttmann, Co-founder and CEO of ClearML. “To navigate this complexity and ensure optimal performance, sophisticated scheduling and orchestration is paramount. ClearML’s new capabilities reduce the overhead of managing and controlling AI infrastructure, empowering AI Builders to scale their AI and machine-learning workflows with unprecedented flexibility, ease and efficiency – giving organizations ultimate control over their AI infrastructure at any scale while seamlessly integrating ClearML.”
Guttmann noted that ClearML also expanded its scheduling and triggering capabilities to further boost an AI team’s efficiency and productivity with the ability to run tasks automatically based on predetermined times or events, an extension of ClearML’s “set-it-and-forget-it” approach to eliminating manual tasks.
Now, DevOps can focus on what matters in getting AI to production, rather than spend time on laborious tasks such as storage maintenance, babysitting AI workflows, provisioning machines, or doling out credentials. ClearML continues to streamline manual and mundane tasks while decreasing friction and overhead for AI team admins, allowing them to spend less time on setup and more time on innovation so they can deliver faster time to value and drive costs down.
This announcement follows the company’s most recent release (announced at NVIDIA GTC 2024), which enables granular management and visibility of compute resource allocations and includes open source fractional GPU capabilities. The company’s expanded capabilities in orchestration, compute management, and AI infrastructure control establishes ClearML as the most comprehensive platform available for AI Builders and DevOps professionals, who use ClearML to build, train, and deploy models at any scale on any AI infrastructure. AI teams can work on shared data and build models seamlessly from anywhere in the world on any AI workload, compute type, or infrastructure – regardless of whether they’re on-prem, cloud, or hybrid; with Kubernetes, Slurm, PBS, or bare metal; and with any type of GPU.
With ClearML, AI Builders and DevOps teams gain ultimate control and granular visibility over what resources and fractions of resources each team or group can access and easily self-serve resources without changing their existing AI/ML workflows. According to the company’s recent survey report, The State of AI Infrastructure 2024, 25% of the 1,000 IT leaders surveyed stated that their company uses Slurm or another open source tool for scheduling and job management. Since Slurm is Linux-native and designed for handling AI/HPC workloads, it is also widely used by many of the world’s most advanced supercomputers.
ClearML’s Slurm/PBS integration enables AI teams to get more out of their Slurm/PBS computing clusters with a single line of code. AI/HPC jobs can be launched from anywhere (code, command line interface, Git, or web UI) and organizations can monitor their Slurm/PBS queues on the platform’s orchestration dashboards. In this way, ClearML helps integrate HPC workloads into an organization’s CI/CD infrastructure, so that customers can securely launch jobs on their cluster from an external endpoint. ClearML’s Slurm/PBS support creates transparency for teams and enables them to focus expensive resources on delivering innovation rather than them standing by, idling and unutilized.
As well, ClearML’s extended capabilities enable AI builders to use the ClearML platform for building scheduling logic and seamlessly passing it through to Slurm/PBS for execution. Now, organizations can leverage the best parts of Slurm/PBS without the extensive coding typically required.