Python Can Do It

Sponsored Post

Python is popular for prototyping high end simulations and data science applications. But when it comes time for actually coding a system for actual production, developers will typically turn to C or other languages that allow them to write performant code that is close to the hardware and the algorithms that must be programmed.

In addition, as developing for many cores within a system has shown its value, Python remains a single threaded environment with the global interpreter lock as the main bottleneck. Threads must wait for other threads to complete before starting to do their assigned work. The result of this model is that production code is produced that is too slow to be useful for large simulations.

[clickToTweet tweet=”Intel Python can handle #HPC workloads.” quote=”Intel Python can handle HPC workloads.”]Recently, many of the perceived bottlenecks of using Python for HPC jobs has been eliminated. This can be accomplished without the need to modify or recode existing Python applications. Developers and users can get very high performance from today’s CPUs such as the Intel® Xeon® Scalable Processors by leveraging advancements in hardware features and the latest powerful performance libraries, that significantly boost overall Python application performance.

The Intel® Distribution for Python* has a number of available tools that can help developers to accelerate the execution of large Python software systems by using a number of C functions that perform (when coded optimally) at machine level speeds. These libraries can vectorize and parallelize the assigned workload and understand the different hardware architectures. Choices can be made at run time in order to take the most advantage of the underlying instruction sets and memory architectures, such as those that are present in the Intel Xeon Phi processor. Various benchmarks have shown orders of magnitude performance gains when running on the latest hardware and taking advantage of both vectorization and parallelization.

Intel® Threaded Building Blocks (TBB) which can be used instead of OpenMP, allows applications to split up work between different cores as well as load balance the work assigned to each core. TBB has the knowledge, at run time, to use the cores available in order to get maximum performance.

Also available to developers is the Intel® Data Analytics Acceleration Library (DAAL). Intel DAAL is a framework and contains libraries that can be easily used for applications where machine learning (ML) may be part of the solution. Streaming data can also be handled and processed using Intel DAAL. Intel DAAL is integrated with a number of other Intel performance packages.

Also optimized for Python on Intel CPUs are packages such as NumPy – which is accelerated with the Intel® Math Kernel Library (MKL), SciPy – the standard scientific toolset also using MKL, and numba – a just-in-time compiler that allows the latest SIMD features and multi-core executive to get the very most out of today’s CPUs. These are just a few of the scores of packages you’ll find in the Intel® Distribution for Python*.

Although using tools and libraries that are supplied by a vendor can help with producing more optimal code, and understanding of the algorithm and code architecture is critical. Tuning will always need to be done by someone who understands the algorithms and the capabilities of the system.  Taking advantage of the underlying hardware  and assigning the appropriate workloads can lead to significant performance increases.  Understanding and reducing bottlenecks need to be identified and action taken.

Get the Intel® Distribution for Python Now