Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Accelerated Python for Data Science

Python* has become one of the most popular programming languages in use today. Easy to learn, with vast open source packages and libraries, Python applications have found their way into just about every computation domain, especially Data Science.

The Intel® Distribution for Python*, part of Intel Parallel Studio XE,  or as a standalone package, achieves accelerated performance with compilers and libraries optimized for the latest Intel architectures. In particular, the library packages targeted for data analytics and numerical computing included in this distribution now support scaling for multi-core and many-core processors as well as distributed cluster and cloud infrastructures.

Combine Python with the Numba just-in-time (JIT) compiler, the Cython compiler, and runtime packages built on Intel performance libraries such as Intel Math Kernel Library (Intel MKL) and Intel Data Analytics Acceleration Library (Intel DAAL), you now have C-like, near-native performance, plus Python bindings for your existing C and C++ libraries.

Data analytics involve some serious math, requiring knowledge from a wide variety of fields. A single application might integrate many complex solutions, increasing development time and risk. Intel DAAL was designed to cover most use cases around data analytics. It provides all the building blocks a developer needs for all stages of data analysis, from data acquisition through prediction and decision making. It scales from a single node to a large cluster with remote storage without additional effort.

Along with popular Python packages such as NumPy, SciPy, scikit-learn*, Intel DAAL is a highly optimized library of computationally intensive routines supporting Intel architectures including Intel Xeon® processors, Intel Core processors, Intel Atom processors and Intel Xeon Phi™ processors. Intel DAAL provides a rich set of algorithms, ranging from the most basic descriptive statistics for datasets to more advanced data mining and machine learning algorithms.

Python can easily utilize the Intel DAAL for robust, scalable, and high performant data processing by using the daal4py package. Python developers can immediately take advantage of the features and optimizations of the Intel DAAL library right out of the box. Shown to give a substantial performance boost over alternatives, data scientists programming in Python with Intel DAAL can implement batch, online, clustering, and much more right within their Python applications.

So, connecting Python to Intel® DAAL is a match made for Data Science applications.

On distributed parallel systems, Intel Python supports the mpi4py library, which interfaces the Intel MPI Library over InfiniBand and the Intel Omni-Path communications fabric. The result is decreased latency and increased scaling for distributed Python applications.

The Intel Distribution for Python takes advantage of the Intel® Advanced Vector Extensions (Intel® AVX) and multiple cores in the latest Intel architectures. By utilizing the highly optimized Intel MKL BLAS and LAPACK routines, key functions run up to 200 times faster on servers and 10 times faster on desktop systems.

This means that existing Python applications will perform significantly better merely by switching to the Intel distribution.

But it’s not just the optimized libraries and compilers that generate vector/parallel code for the latest Intel processors; there is first-class Python support with the Intel VTune™ Amplifier performance analyzer, part of the Intel Parallel Studio XE suite of tools. Intel VTune™ provides line-by-line source code profiling to help find and correct issues causing performance hotspots or bottlenecks in Python as well as Numba*, C and C++ source.

The latest Intel Distribution for Python release offers many performance advances, including:

  • Faster machine learning with scikit-learn key algorithms accelerated with Intel® Data Analytics Acceleration Library
  • The latest TensorFlow* and Caffe* libraries optimized for Intel® architecture available on the Intel channel at anaconda.org

So, with Intel acceleration, it’s really no surprise to see the growing presence of Python in high-performance computing, big data, machine learning, and data science.

Download Intel Distribution for Python now

Get your free download of Intel® Parallel Studio XE

Leave a Comment

*

Resource Links: