As global data grows exponentially, so does the need for accurate, in-depth analysis and data mining. But processing even a small fraction of that data imposes a major resource challenge for most companies and researchers. In this environment, the Intel® Data Analytics Acceleration Library (Intel DAAL) provides an efficient solution.
By now, most large companies as well as scientific and industrial research have invested in big data analysis projects that digest years of historic and newly generated data for modeling, prediction, and decision making. Genomics, risk analysis, social network and consumer preference analysis are just some areas where fast processing of massive data sets is a critical business or research requirement.
Intel DAAL is a high-performance library specifically optimized for big data analysis on the latest Intel platforms, including Intel Xeon®, and Intel Xeon Phi™. It provides the algorithmic building blocks for all stages in data analysis in offline, batch, streaming, and distributed processing environments. It was designed for performance over all the popular data platforms and APIs in use today, including MPI, Hadoop, Spark, R, MATLAB, Python, C++, and Java.
Intel DAAL includes a rich set of algorithms covering basic statistics to more advanced data mining and machine learning methods; from pre-processing, transformation, analysis, and modeling, to prediction and decision making. Intel DAAL provides big data developers the means to build highly optimized code with relatively little effort.
The library itself consists of three major components: Data Management, Algorithms, and Services. The Data Management component provides classes and utilities for acquisition, pre-processing, normalization, and numeric data format conversion. The Algorithms component provides classes that implement methods for data analysis, data mining, modeling, training, and prediction. The Services component provides utilities used within the other two components.Intel DAAL is optimized for big data analysis on the latest Intel platforms, including Intel Xeon Phi™.Click To Tweet
Intel DAAL algorithms for data analysis include:
- Moments of low order and quantiles
- Correlation and variance-covariance matrices
- Distance matrices: cosine and correlation
- K-Means clustering
- Principal component analysis
- Matrix decompositions: Cholesky, singular value, and QR
- Outlier detection: multivariate and univariate
- Association rules
- Sorting observations by features
- Quality metrics for classification algorithms and linear regression
- Optimization solvers
- Normalizations: Z-score and min-max
Algorithms for machine learning, training, and prediction include:
- Linear and ridge regressions
- Classification algorithms, including boosting and Naïve Bayes, Support Vector Machine, and multi-class classifiers
- Implicit alternating least squares recommendation system
- Neural network algorithms
These Intel DAAL algorithms, optimized for the latest Intel processors, are detailed in the product documentation.
Clearly, big data and machine learning involve some serious math, requiring knowledge from a wide variety of fields. A single application might integrate many complex solutions, increasing development time and risk. Intel DAAL was designed to cover most use cases around big data. It provides all the building blocks a developer needs for all stages of data analytics, from data acquisition through prediction and decision making. It scales from a single node to a large cluster with remote storage without additional effort.
Working together with Intel Math Kernel Library (Intel MKL) and Intel Threading Building Blocks (Intel TBB), Intel DAAL succeeds in providing high computational speed so that applications can make better predictions faster and analyze massive data sets using the available compute resources at hand and in the future. The Intel DAAL and Intel MKL teams work closely with Intel processor architects to provide updates that will take advantage of next generation processors and make your code ready when new processors become available.
Intel DAAL and Intel MKL libraries are integral parts of Intel Parallel Studio XE on Windows, Linux, and MacOS operating systems.