Oracle Announces Zettascale Cloud Supercluster with 131,000 Blackwells

Oracle today announced what it said is the first zettascale cloud HPC cluster, powered by Nvidia’s forthcoming Blackwell GPUs, scheduled for shipment in the first half of 2025. Oracle Cloud Infrastructure will be available with up to 131,072 Blackwells delivering 2.4 zettaFLOPS of peak performance, according to Oracle.

The company said this offers more than three times as many GPUs as the Frontier supercomputer, currently the world’s most powerful system on the TOP500 list, and more than six times that of other hyperscalers. OCI Supercluster includes OCI Compute Bare Metal, low latency RoCEv2 with ConnectX-7 NICs and ConnectX-8 SuperNICs or Nvidia Quantum-2 InfiniBand-based networks, and a choice of HPC storage.

OCI Superclusters also can be ordered with OCI Compute powered by either Nvidia H100 or H200 Tensor Core GPUs.

OCI Superclusters with H100s can scale up to 16,384 GPUs with up to 65 ExaFLOPS of performance and 13Pb/s of aggregated network throughput.

OCI Superclusters with H200s will scale to 65,536 GPUs with up to 260 ExaFLOPS of performance and 52Pb/s of aggregated network throughput and will be available later this year.

OCI Superclusters with Nvidia GB200 (Grace Arm-based CPU with Blackwell GPU) NVL72 liquid-cooled bare-metal instances will use NVLink and NVLink Switch to enable up to 72 Blackwell GPUs to communicate with each other at an aggregate bandwidth of 129.6 TB/s in a single NVLink domain. Blackwells with fifth-generation NVLink, NVLink Switch, and cluster networking will enable GPU-GPU communication in a single cluster.

“We have one of the broadest AI infrastructure offerings and are supporting customers that are running some of the most demanding AI workloads in the cloud,” said Mahesh Thiagarajan, executive vice president, Oracle Cloud Infrastructure. “With Oracle’s distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose while preserving the highest levels of data and AI sovereignty.”

Oracle said customers such as WideLabs, an AI medical startup in Brazil, and Zoom use OCI’s AI infrastructure.

Nvidia Blackwell

Zoom uses OCI to provide inference for Zoom AI Companion, the company’s AI personal assistant, designed to help users draft emails and chat messages, summarize meetings and chat threads and generate ideas during brainstorming sessions with colleagues. OCI’s data and AI sovereignty capabilities are intended to help Zoom keep customer data locally in region and support AI sovereignty requirements in Saudi Arabia, where OCI’s solution is initially being rolled out.

“As businesses, researchers and nations race to innovate using AI, access to powerful computing clusters and AI software is critical,” said Ian Buck, vice president of Hyperscale and High Performance Computing, Nvidia. “Nvidia’s full-stack AI computing platform on Oracle’s broadly distributed cloud will deliver AI compute capabilities at unprecedented scale to advance AI efforts globally and help organizations everywhere accelerate research, development and deployment.”

WideLabs, an applied AI startup in Brazil, is training the Amazonia IA LLM on OCI. They developed bAIgrapher, an application that uses its LLM to generate biographical content based on data collected from patients with Alzheimer’s disease to help them preserve important memories.

WideLabs uses the Oracle Cloud São Paulo Region to run its AI workloads with the intent of securing sensitive data within Brazil. WideLabs uses OCI AI infrastructure with H100 GPUs to train its LLMs, as well as Oracle Kubernetes Engine to provision, manage, and operate GPU-powered containers across an OCI Supercluster consisting of OCI Compute connected with OCI’s RMDA-based cluster networking.

“OCI AI infrastructure offers us the most efficiency for training and running our LLMs,” said Nelson Leoni, CEO, WideLabs. “OCI’s scale and flexibility is invaluable as we continue to innovate in the healthcare space and other key sectors.”