NVIDIA A100 Tensor Core GPUs come to Oracle Cloud

Oracle is bringing the newly announced NVIDIA A100 Tensor Core GPU to its Oracle Gen 2 Cloud regions. NVIDIA A100 is the first elastic, multi-instance GPU that unifies training, inference, HPC, and analytics. When running on Oracle Cloud, the new A100 GPUs will help enterprises unlock more value from their data and innovate faster, enabling important breakthroughs such as testing and developing new medications, building safer airplanes, and quickly sourcing natural resources.

Oracle is enhancing what NVIDIA GPUs can do in the cloud,” said Vinay Kumar, vice president, product management, Oracle Cloud Infrastructure. “The combination of NVIDIA’s powerful GPU computing platform with Oracle’s bare metal compute infrastructure and low latency RDMA clustered network is extremely compelling for enterprises. Oracle Cloud Infrastructure’s high-performance file server solutions supply data to the A100 Tensor Core GPUs at unprecedented rates, enabling researchers to find cures for diseases faster and engineers to build safer cars.”

Next Generation GPU for Next Generation Discoveries

With NVIDIA A100 GPUs, Oracle Cloud Infrastructure delivers acceleration and flexibility for training, inferencing, and analytics. Oracle’s newest GPU bare metal and virtual machine instances will accelerate the time to discovery and empower Oracle customers to solve large problems in science, engineering and business.

Our growing collaboration with Oracle is fueling incredible innovations across a wide range of industries and uses,” said Ian Buck, vice president and general manager of Accelerated Computing, NVIDIA. “By integrating NVIDIA’s new A100 Tensor Core GPUs into its cloud service offerings, Oracle is giving innovators everywhere access to breakthrough computing performance to accelerate their most critical work in AI, machine learning, data analytics and high performance computing.”

Many industries are using HPC technology to advance innovations and manage day-to-day business, including:

  • Genomics companies’ workloads include DNA sequencing and protein analysis, popular for ancestry studies, health testing, and the analysis of drug interactions to reduce time to market for new pharmaceutical products.
  • Retailers run AI models to analyze customer data and offer targeted purchase recommendations. These workloads benefit from GPU chips because they are tightly coupled with accelerated hardware, yielding up to 15X performance improvement.
  • Media and entertainment companies rely on HPC for animation, special effects rendering, and media transcoding. These tasks are characterized by bursty workloads requiring hundreds or even thousands of nodes running in parallel.
  • Financial technology companies run HPC tasks for risk analysis, high frequency trading, and financial modeling. This industry has occasional demands for lots of compute and storage resources, such as when running quarterly reports.
  • Automotive companies run complex simulations throughout the design, manufacture, and testing of new vehicles. HPC workloads support computer-aided engineering models for crash testing, simulations, and various types of analyses.
  • Oil and gas companies depend on HPC for geological modeling to predict where to find oil and natural gas resources. These jobs require spatial analysis, seismic analysis, and very large data sets.
  • Aerospace companies require HPC infrastructure for computational fluid dynamics, such as simulating airflow over airplane wings. These simulations require a lot of nodes, each with lots of CPUs and memory.

NVIDIA A100 GPU

NVIDIA A100 Tensor Core GPUs running on Oracle Cloud Infrastructure offers the following advantages:

  • Scales up to clusters of thousands of interconnected servers running GPUs to power the most demanding AI and HPC workloads
  • Can partition each A100 GPU into as many as seven GPU instances using its multi-instance GPU feature to optimize utilization, and extend access to more teams and services
  • Provides the ability to scale down and partition into virtual GPUs to accommodate workloads that run best in a scaled down architecture
  • Advances all major deep learning frameworks such as TensorFlow, PyTorch and Caffe
  • Operates with more than 650 GPU applications for HPC and AI such as MATLAB, Gaussian and NAMB

The new GPU bare metal shape, BM.GPU4.8, will feature 8 x 40 GB NVIDIA A100 Tensor Core GPUs, all interconnected via NVIDIA NVLink. The CPU on board has 64 physical cores of AMD Rome processors running at 2.9 GHz. The new bare metal instance has 2,048 GB of memory, 24 TB NVMe, 1.6 Tbps throughput with RDMA provided by Oracle’s Cluster Networking. This new shape will launch with limited availability soon, and will be available globally across Oracle Cloud regions in Europe, US, and JAPAC this summer.

In addition to the bare metal instance shapes, organizations will also be able to deploy virtual machines of one, two or four GPUs per virtual machine.

Accelerating Data Science and AI

Oracle Cloud Infrastructure Data Science enables teams of data scientists to easily build, train, and manage models on Oracle Cloud in a collaborative, managed environment. The service supports Python, JupyterLab, and a variety of the most popular open source packages for deep learning (such as TensorFlow, Keras, PyTorch and MXNET), machine learning (such as scikit-learn and xgboost), visualization libraries (such as Plotly, matplotlib), and much more. In a future release, data scientists will be able to access the new NVIDIA A100s on the service to speed up large-scale matrix calculations and parallelize large-scale machine learning and deep learning problems.

In addition, Oracle is enabling data scientists to deploy AI models through a pre-configured Image available from the Oracle Cloud Marketplace. This Image includes NVIDIA’s Deep Neural Network libraries, common ML/Deep Learning frameworks, Jupyter Notebooks and common Python/R integrated development environments. This Image also includes basic sample data, code for testing, and can be up and running in minutes. Users can deploy this data science and AI Image today using any of Oracle Cloud Infrastructure’s GPU offerings.

Joint Investment and Innovation in Enterprise AI

Oracle Labs has been working with NVIDIA to integrate CUDA-X libraries into applications created with GraalVM languages. A polyglot binding for GPUs allows GPU kernels to be directly launched from GraalVM languages such as R, JavaScript, Scala and other JVM-based languages. GPU acceleration in GraalVM enables real-time streaming prognostics by accelerating MSET2 (a machine learning method for anomaly detection), enhances conversational AI with Oracle Digital Assistant, and accelerates data science pipelines through the Oracle Cloud Data Science Platform.

These advancements will enable users to easily unlock their data, and integrate ML and Deep Learning into applications. With Oracle Cloud Infrastructure enabled by NVIDIA A100 GPUs, data science teams can continue to accelerate successful model deployment and produce enterprise-grade results and performance for predictive analytics to drive positive business outcomes.

Giving Startups World-Class Technology

Also announced today, Oracle for Startups and NVIDIA Inception are giving startups access to two powerful technologies to cloud computing power that is fast, scalable and highly secure. Eligible startups get access to Oracle for Startups program benefits. Startups will receive free Oracle Cloud credits for three months and a 70% discount for up to two years on ongoing cloud services, which can be used on Oracle Cloud Infrastructure’s GPU portfolio of offerings, such as NVIDIA Tesla V100 and the upcoming NVIDIA A100 GPUs. Existing Oracle for Startups members also get access to NVIDIA’s Inception program. You can read the announcement here and get started by applying at Oracle for Startups.

Getting Started

Users can get started on Oracle Cloud Infrastructure with free credits for premium services such as NVIDIA GPUs.

Sign up for our insideHPC Newsletter