GPU Accelerated Servers for Deep Learning Applications

Print Friendly, PDF & Email
PCIe Gen 4

In this week’s Sponsored Post, Katie (Garrison) Rivera of One Stop Systems explains how GPU accelerated servers can work to increase computer power. 

GPU Accelerated Servers

Katie (Garrison) Rivera, Marketing Communications, One Stop Solutions

Applications such as machine learning and deep learning require incredible compute power to provide artificial intelligence for self-driving cars, climate prediction, drugs that treat today’s worst diseases, plus other solutions to more of our world’s most important challenges. There is a multitude of ways to increase compute power but one of the easiest is to use the most powerful GPUs. The NVIDIA Tesla P100 is the most advanced data center accelerator ever built. The Tesla P100 features NVIDIA NVLink™ technology that enables superior strong-scaling performance for high performance computing (HPC) and hyperscale applications. Up to eight Tesla P100 GPUs interconnected in a single node can deliver the performance of racks of commodity CPU-based servers.

There is a multitude of ways to increase compute power, but one of the easiest is to use the most powerful GPUs.

One Stop Systems introduced two new GPU accelerated servers at SC16 last month that can accommodate up to eight NVIDIA Tesla P100s in order to provide augmented performance for machine learning and deep learning applications.  The OSS-PASCAL4 and OSS-PASCAL8 are purpose-built for deep learning with fully integrated hardware and software. The OSS-PASCAL8 is a 170 TeraFLOP engine with 80GB/s NVLink for the largest deep learning models. The OSS-PASCAL4 provides 21.2 TeraFLOPS of double precision performance with an 80GB/s GPU peer-to-peer NVLink. These systems are tuned for out-of-the-box operation and quick and easy deployment.  For large-scale operations, the OSS GPUltima provides a rack-based solution with the capacity to support up to 128 PCIe GPUs and 80 SXM2 GPUs in a single rack.  The OSS-PASCAL4 and OSS-PASCAL8 can integrate into the GPUltima rack-level solution using 100Gb EDR Infiniband interfaces to large-scale multi-root peer-to-peer RDMA networks.

GPU Accelerated Servers

NVIDIA Tesla P100

The OSS Deep Learning Appliances come with a choice of machine learning frameworks such as Caffe, Torch, Tensorflow and Theano. They also come with a choice of machine learning libraries such as MLPython, NVIDIA cuDNN, DIGITS and CaffeOnSpark. GPU drivers, CUDA drivers, CUB and NCCL are supporting elements for the OSS-PASCAL4 and OSS-PASCAL8. Installation of deep learning packages is time consuming; therefore, these pre-installed items save time for deep learning users, allowing them to immediately begin training networks more quickly. All GPUs are capable of Peer-to-Peer direct access to all other GPUs’ memory as well as direct transfer operations via NVLink at high Bandwidth. These GPU Accelerated servers provide high performance for collective communications. The PCI Express (PCIe) bandwidth is fully available for host and/or NIC communication during inter-GPU communication.

The GPU management and monitoring is pre-installed and provides both health and workload management. It samples all of the metrics provided by all of the NVIDIA GPUs and automatically performs health checks on every GPU. It is integrated with all of the popular HPC workload managers and automatically configures GPUs within the workload manager.  These features help easily monitor GPUs to ensure less downtime and to offer easy serviceability. Users want to be sure that all of the GPUs are running at optimal performance as deep learning applications require such large amounts of compute power.

This guest article was submitted by Katie (Garrison) Rivera, marketing communications at One Stop Systems.