“The artificial intelligence race is on,” said Jen-Hsun Huang, co-founder and CEO of NVIDIA. “Machine learning is unquestionably one of the most important developments in computing today, on the scale of the PC, the internet and cloud computing. Industries ranging from consumer cloud services, automotive and health care are being revolutionized as we speak. Machine learning is the grand computational challenge of our generation. We created the Tesla Hyperscale Accelerator line to give machine learning a 10X boost. The time and cost savings to data centers will be significant.”
In the past few years, accelerated computing has become strategically important for a wide range of applications. To gain performance on a variety of codes, hardware developers and software developers have concentrated their efforts to create systems that can accelerate certain applications by significant amount compared to what was previously possible.
VDI or Virtual Desktop Infrastructure helps companies save money, time and resources. Instead of large bulky machines on every desk in the office, companies can connect multiple workstations to a single computer using thin clients. Instead of replacing individual desktops every year, companies only have to replace thin clients every 5 years. And when it comes time to do updates, the IT staff updates the one computer instead of spending time updating every individual workstation.
“Deep neural networks are increasingly important for powering AI-based applications like speech recognition. Baidu’s research shows that adding GPUs to the data center makes deploying big deep neural networks practical at scale. Deep learning based technologies benefit from batching user requests in the data center, which requires a different software architecture than traditional web applications.”
“Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base,” said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. “This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach.”
A successful example of how a well-managed GPU cluster allowed scientist to focus on obtaining results comes from the Tokyo University of Agriculture and Technology (TUAT) results. A research group lead by Dr. Akinori Yamanaka develops computation models and simulates engineering materials, for a variety of applications, using HPC. Using Bright Cluster Manager, Dr. Yamanaka and his team were able to immediately focus on algorithm development and not burden the team with cluster administration issues.
Training the neural networks used in deep learning is an ideal task for GPUs because GPUs can perform many calculations at once (parallel calculations), meaning the training will take much less time than it used to take. More GPUs means more computational power so if a system has multiple GPUs, it can compute data much faster than a system with CPUs only, or a system with a CPU and a single GPU. One Stop System’s High Density Compute Accelerator is the densest GPU expansion system to date.
Professor Taisuke Boku from the University of Tsukuba presented this talk at the PBS User Group. “We have been operating a large scale GPU cluster HA-PACS with 332 computation nodes equipped with 1,328 GPUs managed by PBS Professional scheduler. The users are spread out across a wide variety of computational science fields with widely distributed resource sizes from single node to full-scale parallel processing. There are also several categories of user groups with paid and free scientific projects. It is a challenging operation of such a large system keeping high system utilization rate as well as keeping fairness over these user groups. We have successfully been keeping over 85%-90% of job utilization under multiple constraints.”