Data Analytics Requires New Libraries

PCA DAAL benchmarkA convergence in the fields of High Performance Computing (HPC) and Big Data has led to new opportunities for software developers to create and deliver products that can help to analyze very large amounts of data. The HPC software ecosystem over the years have created and maintained sets of numerical libraries, communication API’s (MPI) and applications to make running HPC type applications faster and simpler to design. Low level libraries have been developed so that developers can concentrate on higher level algorithms. Products such as the Intel Math Kernel Library (Intel MKL) have been highly tuned to take advantage of multiple cores and newer instructions sets.

With Big Data applications becoming increasingly available, an opportunity existed to create and provide lower level libraries that offer very high performance on modern systems. Intel recently released the Intel Data Analytics Acceleration Library (Intel DAAL). By providing a library like the Intel DAAL, developers can get their applications created faster, without having to be concerned with the lower level computational details. With a highly tuned library, larger datasets can be analyzed in a shorter time as compared to un-optimized code. In addition, as new features on new chips are released, a library can be downloaded, linked to and the application will be ready to run, taking advantage the new instructions or additional cores. By working side by side with chip architects, performance can be squeezed out of Intel CPUs and coprocessors.

The Intel DAAL  addresses all stages of the data analytics pipeline: Pre-processing, Transformation, Analysis, Modeling, Validation, and Decision Making. Various operations for each of these stages can be used, such as decompression, filtering and normalization for the Pre-processing stage.  The target processors, for which the Intel DAAL is optimized for, include the Intel Atom, Core and Xeon processors, and the Intel Xeon Phi coprocessor. The Intel DAAL is compatible with languages and compilers from Microsoft, GCC, Intel. C, C++, C#, Fortran, Java and ASM.  Operating system support is provided for Linux, Windows and OS X.

It is encouraging to see that the ecosystem for big data software is in development.

 

Solve more problems with the new data analytics tools. Intel® Parallel Studio XE. Try it today.