Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Speeding Up Big Data Analysis With Intel MKL and Intel DAAL

“New algorithms that can query massive amounts of data an draw conclusions have been developed, but these algorithms need to be optimized on the underlying hardware. This is where the expertise of vendors who develop the hardware can add tremendous value. Optimizing the underlying libraries that can execute with a high degree of parallelism will definitely lead to improved performance for the software and productivity gains for the organization.”

Go with Intel® Data Analytics Acceleration Library and Go*

Use of the Go* programming language and it’s developer community has grown significantly since it’s official launch by Google in 2009. Like many popular programming languages (C and Java come to mind), Go started as an experiment to design a new programming language that would fix some of the common problems of other languages and yet stay true to the basic tenets of modern programming: be scalable, productive, readable, enable robust development environments, and support networking and multiprocessing.

Performance Gains Using Libraries

In many cases, applications that perform various simulations use some of the same math functions that many other applications use. Rather than each developer recoding the same math functions over and over, libraries, developed by experts can significantly speed up execution of the overall application. Since there can be many optimizations that experts who understand many of the nuances of the hardware would understand, it is important that developers be familiar with various libraries that are made available for HPC types of applications.

Intel DAAL Accelerates Data Analytics and Machine Learning

Intel DAAL is a high-performance library specifically optimized for big data analysis on the latest Intel platforms, including Intel Xeon®, and Intel Xeon Phi™. It provides the algorithmic building blocks for all stages in data analysis in offline, batch, streaming, and distributed processing environments. It was designed for efficient use over all the popular data platforms and APIs in use today, including MPI, Hadoop, Spark, R, MATLAB, Python, C++, and Java.

A Decade of Multicore Parallelism with Intel TBB

While HPC developers worry about squeezing out the ultimate performance while running an application on dedicated cores, Intel TBB tackles a problem that HPC users never worry about: How can you make parallelism work well when you share the cores that you run upon?” This is more of a concern if you’re running that application on a many-core laptop or workstation than a dedicated supercomputer because who knows what will also be running on those shared cores. Intel Threaded Building Blocks reduce the delays from other applications by utilizing a revolutionary task-stealing scheduler. This is the real magic of TBB.

Optimizing Your Code for Big Data

Libraries that are tuned to the underlying hardware architecture can increase performance tremendously. Higher level libraries such at the Intel Data Analytics Acceleration Library (Intel DAAL) can assist the developer with highly tuned algorithms for data analysis as well as machine learning. Intel DAAL functions can be called within other, more comprehensive frameworks that deal with the various types of data and storage, increasing the performance and lowering the development time of a wide range of applications.

Better Software For HPC through Code Modernization

Vectorization and threading are critical to using such innovative hardware product such as the Intel Xeon Phi processor. Using tools early in the design and development processor that identify where vectorization can be used or improved will lead to increased performance of the overall application. Modern tools can be used to determine what might be blocking compiler vectorization and the potential gain from the work involved.

Offloading vs Native Execution on Intel Xeon Phi Coprocessors

“Native execution is good for application that are performing operations that map to parallelism either in threads or vectors. However, running natively on the coprocessor is not ideal when the application must do a lot of I/O or runs large parts of the application in a serial mode. Offloading has its own issues. Asynchronous allocation, copies, and the deallocation of data can be performed but it complex. Another challenge with offloading is that it requires memory blocking. Overall, it is important to understand the application, the workflow within the application and how to use the Intel Xeon Phi coprocessor most effectively.”

Video: Speeding Up Code with the Intel Distribution for Python

David Bolton from Slashdot shows how ‘embarrassingly parallel’ code can be sped up over 2000x (not percent) by utilizing Intel tools including the Intel Python compiler and OpenMP. “The Intel Distribution for Python* 2017 Beta program is now available. The Beta product adds new Python packages like scikit-learn, mpi4py, numba, conda, tbb (Python interfaces to Intel Threading Building Blocks) and pyDAAL (Python interfaces to Intel Data Analytics Acceleration Library). “

Using Vectors on Intel Xeon Phi

The use of vector instructions can speed up applications tremendously when used correctly. The benefit is that much more work can be done in a clock cycle than by performing the operation one at a time. The Intel Xeon Phi coprocessor was designed with strong support for vector level parallelism. “When these techniques are used either individually or in combination in different areas of the application, the performance will surely be increased, in many cases without a lot of effort.”