CocoLink Using Consumer GPUs for Deep Learning

Print Friendly, PDF & Email

cocoCocoLink, a subsidiary of Seoul National University, in collaboration with Orange Silicon Valley, has upgraded its KLIMAX 210 server with 20 of the latest GeForce 1080 GPUs – with the eventual goal of scaling the single 4U rack to more than 200 teraflops.

The system is being used to test the limits of deep learning algorithms because of the number of GPUs present in the server. Deep learning appliances such as the KLIMAX 210 and the DGX-1 from Nvidia offer a bridge between today’s deep learning technology, which can scale to around 10 GPUs, and cluster-based deep learning which will scale across much larger systems.

CocoLink first announced its KLIMAX 210 server, with space for 20 GPUs in a single 4 U rack, earlier this year in March. The initial project was successful in setting up a system with 20 Functional GPUs inside a single server. With the 20 NVIDIA K40 GPUs, the system is capable of delivering 100 TeraFLOPS of computational power. The company has now upgraded this system from 20 NVIDIA K40 GPUs to the latest GeForce consumer GPUs, the GTX 1080, which has the potential to deliver more than 200 teraflops once the system can be scaled to run efficiently.

This upgrade from the K40 to the GTX 1080, based on Nvidia’s latest architecture, Pascal, represents the first time the commercial GPUs, typically associated with the consumer gaming market, have been validated for Deep Learning.

CocoLink reports that the GTX 1080 based system is up to 3.5 times faster for image recognition based deep learning than the NVIDIA Tesla K40 enterprise grade GPUs. However, it should be noted that the K40 was first unveiled in 2014. The latest Pascal based GPU for deep learning, the Tesla P100, has more than four times the performance of the K40. This performance could be further increased by using NV Link, but the Klimax 210 server does not currently support this technology.

To date, the team has loaded the Klimax server with 10 Pascal GPUs and has successfully scaled the algorithm to 8 GPUs, delivering 106 teraflops of single precision performance.

A team of artificial intelligence (AI) researchers at Orange France were able to scale Caffe to 8 GPUs using the beta release of CUDA 8.0 and CuDNN 5 and CuDNN4. The eventual objective is to scale Caffe to take advantage of all 20 Pascal GPUs. The full 20 GPU server could theoretically deliver more than 200+ teraflops making this the world’s highest density Deep Learning systems.

Orange Silicon Valley is a Silicon Valley-based innovation centre for the global telecom operator, Orange. The first demonstration of the 20 GPU, deep learning server was at SC15, where CocoLink demonstrated the technology in conjunction with Orange on the EchoStreams booth.

In response to the demonstration, Jerome Ladouar, VP Infrastructure, technologies and engineering at Orange explained the importance of deep learning technology. He said: “It is now possible to run Deep Learning over massive volumes of video data at high speed and also perform contextual analysis over several hundred streams in real time. With our partners, we have prototyped an advanced video analytics capability that could efficiently exploit a supercomputer in a box at the edge of our network; we can thus envision a convergence between AI and Exascale.”

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter