Lambda Launches Nvidia-Based Cloud Clusters for AI Model Training

SAN JOSE, July 24, 2024 — GPU cloud company Lambda has unveiled Lambda 1-Click Clusters, designed for AI engineers’ and researchers’ short-term access to multi-node GPU clusters in the cloud for large-scale AI model training.

Lambda said the launch marks the first time such access to NVIDIA H100 Tensor Core GPUs on 2 to 64 nodes has been made available on demand and through a self-serve cloud service, without requiring expensive long-term contracts.

Lambda 1-Click Clusters help address the particular needs of today’s AI teams, who may not be able to afford but also rarely require 24/7 access to top-end GPUs for an entire year or longer. These teams instead need to be able to quickly spin up a short-term cluster with hundreds of GPUs for a few weeks to run experiments, take a few weeks off to regroup without wasting idle GPU time, then prepare for the next iteration. Connecting these engineers and researchers to a large number of GPUs when they need them, paired with ease of access and use, ensures today’s AI innovation momentum isn’t hindered by financial or contractual limitations.

“Lambda has solved a complex compute challenge only a few very large companies have: partitioning a large, high-performant AI deployment to make smaller GPU clusters. Making this available on demand, through a self-serve model with no humans in the loop – all packaged with our trademark ease of use – adds to why we’re so thrilled about this launch,” said Robert Brooks, founding team and VP of Revenue at Lambda. “Our 1-Click Clusters are a big step toward a world where GPUs’ availability, long-term contracts and high costs don’t stand in between AI teams and the ability to turn their ideas into breakthroughs.”

Lambda 1-Click Clusters, which only have a reservation minimum of two weeks, will allow access to multi-node clusters featuring 16 to 512 interconnected NVIDIA H100 GPUs with NVIDIA Quantum-2 InfiniBand networking. The introduction of 1-Click Clusters serves as the latest example of Lambda’s commitment to democratizing access to, and usage of, Distributed Training for the AI community, in a market where high-performing compute access is often reserved for large enterprises and AI labs. The launch follows the company’s April 2024 announcement of a $500 million GPU-backed facility to expand its on-demand cloud offering, as well as a $320 million Series C funding round in February 2024.

“Bringing powerful, accelerated computing access to a broad range of teams and needs will be critical to achieving transformative AI breakthroughs in every industry. Lambda’s 1-Click Clusters providing self-serve access to NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand will pioneer innovative ways for teams to take on unique and demanding AI workloads,” said Dave Salvator, Director of Accelerated Computing Products at NVIDIA.

“Lambda’s 1-Click Clusters have a stable training infrastructure and are easy to set up with direct access to the raw metal GPUs, massive local storage and stable networking at scale,” said Mahmoud Felfel, co-founder of PlayHT, a leader in conversational voice AI. “It has been our best experience in training runs, and as a team they have been super responsive and supportive across the board.”

Sponsored Guest Articles

Eviden’s Center for Excellence in Performance Programming (CEPP) – Accelerate Workloads, Add Value to Simulation!

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideAI News