While HPC developers worry about squeezing out the ultimate performance while running an application on dedicated cores, Intel TBB tackles a problem that HPC users never worry about: How can you make parallelism work well when you share the cores that you run upon?” This is more of a concern if you’re running that application on a many-core laptop or workstation than a dedicated supercomputer because who knows what will also be running on those shared cores. Intel Threaded Building Blocks reduce the delays from other applications by utilizing a revolutionary task-stealing scheduler. This is the real magic of TBB.
Libraries that are tuned to the underlying hardware architecture can increase performance tremendously. Higher level libraries such at the Intel Data Analytics Acceleration Library (Intel DAAL) can assist the developer with highly tuned algorithms for data analysis as well as machine learning. Intel DAAL functions can be called within other, more comprehensive frameworks that deal with the various types of data and storage, increasing the performance and lowering the development time of a wide range of applications.
Vectorization and threading are critical to using such innovative hardware product such as the Intel Xeon Phi processor. Using tools early in the design and development processor that identify where vectorization can be used or improved will lead to increased performance of the overall application. Modern tools can be used to determine what might be blocking compiler vectorization and the potential gain from the work involved.
“Native execution is good for application that are performing operations that map to parallelism either in threads or vectors. However, running natively on the coprocessor is not ideal when the application must do a lot of I/O or runs large parts of the application in a serial mode. Offloading has its own issues. Asynchronous allocation, copies, and the deallocation of data can be performed but it complex. Another challenge with offloading is that it requires memory blocking. Overall, it is important to understand the application, the workflow within the application and how to use the Intel Xeon Phi coprocessor most effectively.”
David Bolton from Slashdot shows how ‘embarrassingly parallel’ code can be sped up over 2000x (not percent) by utilizing Intel tools including the Intel Python compiler and OpenMP. “The Intel Distribution for Python* 2017 Beta program is now available. The Beta product adds new Python packages like scikit-learn, mpi4py, numba, conda, tbb (Python interfaces to Intel Threading Building Blocks) and pyDAAL (Python interfaces to Intel Data Analytics Acceleration Library). “
The use of vector instructions can speed up applications tremendously when used correctly. The benefit is that much more work can be done in a clock cycle than by performing the operation one at a time. The Intel Xeon Phi coprocessor was designed with strong support for vector level parallelism. “When these techniques are used either individually or in combination in different areas of the application, the performance will surely be increased, in many cases without a lot of effort.”
In this video from the HPC User Forum in Tucson, Gary Paek from Intel presents: Intel’s Machine Learning Strategy. “Earlier this week, Intel announced the inception of the Intel Data Analytics Acceleration Library (Intel DAAL) open source project. Intel DAAL helps to speed up big data analysis by providing highly optimized algorithmic building blocks for all stages of data analytics (preprocessing, transformation, analysis, modeling, validation, and decision making) in batch, online, and distributed processing modes of computation.”
Today Intel announced the inception of the Intel Data Analytics Acceleration Library (Intel DAAL) open source project. “Intel DAAL helps to speed up big data analysis by providing highly optimized algorithmic building blocks for all stages of data analytics (preprocessing, transformation, analysis, modeling, validation, and decision making) in batch, online, and distributed processing modes of computation. The open source project is licensed under Apache License 2.0.”