When used in a TBB environment, Intel has demonstrated a many-fold performance improvement over the same parallelized code using Intel MKL in an OpenMP environment. Intel TBB-enabled Intel MKL is ideal when there is heavy threading in the Intel TBB application. Intel TBB-enabled Intel MKL shows solid performance improvements through better interoperability with other parts of the workload.
Intel DAAL is a high-performance library specifically optimized for big data analysis on the latest Intel platforms, including Intel Xeon®, and Intel Xeon Phi™. It provides the algorithmic building blocks for all stages in data analysis in offline, batch, streaming, and distributed processing environments. It was designed for efficient use over all the popular data platforms and APIs in use today, including MPI, Hadoop, Spark, R, MATLAB, Python, C++, and Java.
“Many of the libraries developed in the 70s and 80s for core linear algebra and scientific math computation, such as BLAS, LAPACK, FFT, are still in use today with C, C++, Fortran, and even Python programs. With MKL, Intel has engineered a ready-to-use, royalty-free library that implements these numerical algorithms optimized specifically to take advantage of the latest features of Intel chip architectures. Even the best compiler can’t compete with the level of performance possible from a hand-optimized library. Any application that already relies on the BLAS or LAPACK functionality will achieve better performance on Intel and compatible architectures just by downloading and re-linking with Intel MKL.”
While HPC developers worry about squeezing out the ultimate performance while running an application on dedicated cores, Intel TBB tackles a problem that HPC users never worry about: How can you make parallelism work well when you share the cores that you run upon?” This is more of a concern if you’re running that application on a many-core laptop or workstation than a dedicated supercomputer because who knows what will also be running on those shared cores. Intel Threaded Building Blocks reduce the delays from other applications by utilizing a revolutionary task-stealing scheduler. This is the real magic of TBB.
Libraries that are tuned to the underlying hardware architecture can increase performance tremendously. Higher level libraries such at the Intel Data Analytics Acceleration Library (Intel DAAL) can assist the developer with highly tuned algorithms for data analysis as well as machine learning. Intel DAAL functions can be called within other, more comprehensive frameworks that deal with the various types of data and storage, increasing the performance and lowering the development time of a wide range of applications.
With the introduction of the Intel Scalable System Framework, the Intel Xeon Phi processor can speed up Finite Element Analysis significantly. Using highly tuned math libraries such as the Intel Math Kernel Library (Intel MKL), FEA applications can execute math routines in parallel on the Intel Xeon Phi processor.
The ability to develop applications independent of the hardware availability at run time is a very important concept that enables developers to take advantage of the latest and greatest processing and coprocessing power. Without having to make run time checks on hardware availability is critical to a smooth running HPC environment.
The use of vector instructions can speed up applications tremendously when used correctly. The benefit is that much more work can be done in a clock cycle than by performing the operation one at a time. The Intel Xeon Phi coprocessor was designed with strong support for vector level parallelism. “When these techniques are used either individually or in combination in different areas of the application, the performance will surely be increased, in many cases without a lot of effort.”