AI and HPC: Inferencing, Platforms & Infrastructure

Print Friendly, PDF & Email

This feature continues our series of articles that survey the landscape of HPC and AI. This post focuses on inferencing, platforms, and infrastructure at the convergence of HPC and AI.

Inferencing is the operation that makes data derived models valuable because they can predict the future and perform recognition tasks better than humans. Inferencing works because once the model is trained (meaning the bumpy surface has been fitted) the ANN can interpolate between known points on the surface to correctly make predictions for data points it has never seen before—meaning they were not in the original training data.

Without getting too technical, during inferencing, ANNs perform this interpolation on a nonlinear (bumpy) surface, which means that ANNs can perform better than a straight line interpolation like a conventional linear method. Further, ANNs are also able to extrapolate from known points on the fitted surface to make predictions for data points that are outside of the range of data the ANN saw during training. This occurs because the surface being fitted is continuous. Thus, people say the trained model has generalized the data so it can make correct predictions.

Don’t limit your thinking about what AI can do

The popularity of “better than human” classification on tasks that people do well (such as recognizing faces, Internet image search, self-driving cars, etcetera) has reached a mass audience. What has been lacking is coverage that machine learning is also fantastic at performing tasks that humans tend to do poorly.

For example, machine learning can be orders of magnitude better at signal processing and nonlinear system modeling and prediction than other methods. As a result, machine learning

AI and HPC

Inferencing works because once the model is trained — meaning the bumpy surface has been fitted — the ANN can interpolate between known points on the surface to correctly make predictions for data points it has never seen before.

has been used to model electrochemical systems, model bifurcation analysis, perform Independent Component Analysis, and much, much more for decades.

Similarly, an autoencoder can be used to efficiently encode data using analogue encoding (which can be much more compact than traditional digital compression methods), perform PCA (Prin

ciple Components Analysis), NLPCA (Nonlinear Principle Component Analysis), and perform dimension reduction. Many of these methods are part and parcel of the data scientist’s toolbox.

Dimension reduction is of particular interest to anyone who uses a database.

Basically, an autoencoder addresses the curse of dimensionality problem where every point effectively becomes a nearest neighbor to all the other points in the database as the dimension of the data increases. People quickly discover that they can put high dimensional data into a data base, but their queries either return either no data or all the data. An autoencoder that is trained to represent the high dimensional data in a lower dimension with low error means all the relationships between the data points are preserved. Thus, allowing database searches to find similar items in the lower dimensional space. In other words, autoencoders can allow people to perform meaningful searches on high-dimensional data, which can be a big win for many people. Autoencoders are also very useful for those who wish to visualize high-dimensional data.

Autoencoders can allow people to perform meaningful searches on high-dimensional data, which can be a big win for many people.

Platform perceptions are changing

Popular perceptions about the hardware requirements for deep learning are changing as CPUs continue to be the desired hardware training platform in the data center. The reason is that CPUs deliver the required parallelism plus memory and cache performance to support the massively parallel FLOP/s intensive nature of training, yet they can also efficiently support both general purpose workloads and machine learning data-preprocessing workloads. Thus, data scientists and data centers are converging on similar hardware solutions, namely to hardware that can meet all their needs and not just accelerate one aspect of machine learning. This reflects recent data center procurement trends like MareNostrum 4 at Barcelona Supercomputing Center and the TACC Stampede2 upgrade, both of which were established to provide users with general workload support.

[clickToTweet tweet=”Popular perceptions about the hardware requirements for deep learning are changing. #HPC” quote=”Popular perceptions about the hardware requirements for deep learning are changing. #HPC”]

‘Lessons Learned’: Pick a platform that supports all your workloads

In particular, don’t ignore data preprocessing as the extraction and cleaning of training data from unstructured data sources can be as big a computational problem as the training process itself. Most deep learning articles and tutorials neglect this “get your hands dirty with the data” issue, but the importance of data preprocessing cannot be overstated!

inferencing

Machine learning is also fantastic at performing tasks that humans tend to do poorly.

Data scientists tend to spend most of their time working with the data. Most are not ninja programmers so support for the productivity languages they are familiar with is critical to having them work effectively. After that, the hardware must support big memory and the performance of a solid state storage subsystem to get them to time-to-model performance considerations.

Thus, the popularity of AI today reflects the convergence of scalable algorithms, distributed training frameworks, hardware, software, data preprocessing, and productivity languages so people can use deep learning to address their computation models, regardless of how much data they might have—and regardless of what computing platform they may have.

Integrating AI into infrastructure and applications

Intel is conducting research to help bridge the gap and bring about the much needed HPC-AI convergence. The IPCC collaboration that achieved 15 PF/s of performance using 9600 Intel Xeon Phi processor based nodes on the NERSC Cori supercomputer is one example.

An equally important challenge in the convergence of HPC and AI is the gap between programming models. HPC programmers can be “parallel programming ninjas”, but deep learning and machine learning is mostly programmed using MATLAB-like frameworks. The AI community is evolving to address the challenge of delivering scalable, HPC-like performance for AI applications without the need to train data scientists in low-level parallel programming.

An equally important challenge in the convergence of HPC and AI is the gap between programming models.

Further, vendors like Intel are addressing the software issue at the levels of library, language, and runtime. More specifically, in collaboration with two university partners, Intel has achieved a significant (more than 10x) performance improvement through libraries calls and also by enabling MPI libraries to become efficiently callable from Apache Spark, an approach that was described in Bridging the Gap Between HPC and Big Data Frameworks at the Very Large Data Bases Conference (VLDB) earlier this year. Additionally, Intel in collaboration with Julia Computing and MIT, has managed to significantly speed up Julia programs both at the node-level and on clusters.19 Underneath the source code, ParallelAccelerator and the High Performance Analytics Toolkit (HPAT) turn programs written in productivity languages (such as Julia and Python) into highly performant code. These have been released as open source projects to help those in academia and industry to push advanced AI runtime capabilities even further.

The insideHPC Special Report on AI-HPC will also cover the following topics over the next few weeks:

Download the full report: “insideHPC Special Report: AI-HPC is Happening Now” courtesy of Intel.