This is the first in a five-part series exploring the potential of unified deep learning with CPU, GPU and FGPA technologies. This post explores the machine learning potential of combining different advanced technologies.
Deep learning and complex machine learning has quickly become one of the most important computationally intensive applications for a wide variety of fields. The combination of large data sets, high-performance computational capabilities, and evolving and improving algorithms has enabled many successful applications which were previously difficult or impossible to consider.
This series explores the challenges of deep learning training and inference, and discusses the benefits of a comprehensive approach for combining CPU, GPU, and FPGA technologies, along with the appropriate software frameworks in a unified deep learning architecture. Each of these hardware technologies offers unique benefits to the deep learning problem, and a properly designed system can take advantage of this combination. Moreover, the combination can provide unique capabilities that result in higher performance, better efficiency, greater flexibility, and a hedge against algorithm obsolescence compared to CPU/GPU and FPGA systems designed separately.
Aside from the underlying hardware approaches, a unified software environment is necessary to provide a clean interface to the application layer. This needs to account for several factors, including framework support, different compiler and code generator technologies, and optimization support for the underlying hardware engines. Higher-level frameworks (e.g., TensorFlow, Theano) can effectively hide most heterogeneity from application developers as well as enable portability across different systems. This is a powerful enabler for heterogeneous hardware. For application developers working below the framework level, the AMD ROCm and MIopen software frameworks are discussed as an example of a unified software environment applicable to a CPU and GPU solution. FPGAs are primarily used for inference, and the xfDNN middleware from Xilinx captures the software features essential for implementing deep learning inference on FPGAs.
[clickToTweet tweet=”Deep learning has quickly become one of the most important computationally intensive applications. #hpc” quote=”Deep learning has quickly become one of the most important computationally intensive applications. #hpc”]
A long-term vision for application developers is a full and seamless programing environment that works across CPUs, GPUs, and FPGAs. This could initially focus on support for a common language and runtime, such as OpenCL, and later be extended to additional languages. The language support would hide any internal differences in compilers and runtimes between the CPU, GPU, and FPGA implementations. This seamless programming environment will facilitate the full end-to-end optimization of resource allocation.
Deep learning has emerged as the most effective method for learning and discerning classification of objects, speech, and other types of information resolution. A brief review of deep learning is useful, although there are many good references that cover the historical and state-of-the-art technology. The main purpose here is to illustrate the compute and data management challenges that exist as a result of implementing successful deep learning systems.
Machine Learning Background
Some basic terms commonly used in this field are defined below:
Machine Intelligence (MI): A program that can respond to inputs, analyze data, recognize patterns, or develop strategies, in ways that would be considered as displaying intelligence when done by humans. For our purposes, we consider machine intelligence (MI) and artificial intelligence (AI) to be interchangeable terms.
Machine Learning (ML): A subset of machine intelligence algorithms that improve their performance over time, typically when they are exposed to more data.
Neural Network (NN): A data structure that represents artificial neurons interconnected by artificial synapses having various weights or strengths, similar to biological neural networks present in the brain. A neural network is “trained” by adjusting the weights of the various artificial synapses so that the network produces a desired output for various input data.
Deep learning has emerged as the most effective method for learning and discerning classification of objects, speech, and other types of information resolution.
Deep Neural Networks (DNN): Multilayered neural networks with a large number of hidden layers used by deep learning algorithms. Two commonly used DNN variations are convolutional neural networks (CNN) and recurrent neural networks (RNN). Typically, CNNs are used for image processing related tasks whereas RNNs are used for speech and natural language processing tasks.
Deep Learning (DL): Machine learning algorithms with multilayered neural networks that learn from exposure to vast amounts of data.
DL Training: Using a set of training sample data to determine the optimal weights of the artificial neurons in a DNN. Modern DL models use a deep network with hidden layers and a process called stochastic gradient descent to train the network. To achieve acceptable accuracy, many training samples are needed in the training process. DL training times can range from days to weeks for large training sets involving millions of data samples.
DL Inference: Analyzing specific data using a previously trained DNN. For most applications, the latency for inference operations is typically less than a second. DNNs trained for image classification have improved to the point that they are virtually identical to (or in some cases exceed) human classification accuracy.
Over the next few weeks, this series will also cover the following topics:
- Exploring the Potential of Deep Learning Software
- The Ins and Outs of DNN Implementation and Optimization
- Computational Approaches – CPU, GPU, and FPGA
- Unified Deep Learning Configurations and Emerging Applications
You can download the full report here, courtesy of AMD and Xilinx, “Unified Deep Learning with CPU, GPU and FPGA technology.”