Back in 1992, programmers and computer industry specialists from 40 organizations in government, academia, and the private sector gathered in Williamsburg, Virginia for a Workshop on Standards for Message Passing in a Distributed Memory Environment. There they discussed and developed what is today the widely used Message Passing Interface (MPI) specification for programming distributed memory clusters. MPI is now the standard and most common parallel programming model for developing portable and efficient high-performance parallel applications.
More recently, a hybrid parallel programming model has evolved to take advantage of multicore/many-core architectures, such as Intel Xeon® processors and Intel Xeon Phi™ coprocessors, that combines MPI with OpenMP*. This model deploys MPI for communicating between nodes, and OpenMP for controlling groups of threads running on each node.
With the release of Intel Parallel Studio XE 2017, the focus is on making applications perform better on Intel architecture-based clusters. Intel MPI Library 2017, a fully integrated component of Intel Parallel Studio XE 2017, implements the high-performance MPI-3.1 specification on multiple fabrics. It enables programmers to quickly deliver the best parallel performance, even if you change or upgrade to new interconnects, without requiring changes to the software or operating environment.
This interconnect independence means that programmers can develop MPI codes without concern for the particular fabric it will be running on — It will run efficiently on whatever network is chosen by the user at runtime. This includes TCP sockets, shared memory, or one of many Remote Direct Memory Access (RDMA) based interconnects, including InfiniBand*. The Intel MPI Library provides an accelerated, universal, multi-fabric layer for fast interconnects through the Direct Access Programming Library* (DAPL) or Open Fabrics Association* (OFA) methodologies. It automatically chooses the fastest transport available while also reducing memory through several methods that allocate only the memory space actually required.Intel MPI Library 2017, optimizes the high-performance MPI-3.1 specification on multiple fabrics.Click To Tweet
Also, Intel MPI Library 2017 binary compatibility with MPI-1.x and MPI-2.x applications means that existing applications can still take advantage of performance improvements without recompiling merely by linking to the latest Intel MPI Library. Compatibility extends to supporting the performance enhancements in the newest Intel products, including AVX2, TSX, FMA3, and AVX-512), while preserving the ability to run on older Intel and compatible processors.
Both Linux and Windows operating systems are supported. Using the Hydra MPI process manager, a single MPI job can run over a cluster with mixed Linux and Windows operating systems. This adds greater flexibility in job deployment.
An even more fascinating development is the support for MPI in Python. While Fortran and C, and more recently C++, have traditionally been the programming languages of choice in HPC, Python has become increasingly popular. Its wide range of modules allow for easy development of scientific and engineering applications related to modeling, simulation, and design. Which raises the question about using MPI in Python codes.
MPI for Python (mpi4py), now part of the Intel Distribution for Python and Intel Parallel Studio XE 2017, provides an object-oriented approach to MPI. The interface was designed to translate MPI syntax and semantics of the standard C++ MPI bindings to Python. This enables the implementation of many algorithms directly in Python, with negligible overhead, and with almost the same performance as compiled Fortran, C, or C++ codes.
MPI for Python is one of the main components of the ParaView multiplatform data analysis and visualization application, and the yt project, an integrated science environment for solving astrophysical problems.
Altogether, Intel Parallel Studio XE 2017 provides an integrated toolset for developing, optimizing, and tuning the performance and reliability of MPI applications. For example, Intel Trace Analyzer and Collector, a scalable MPI performance profiler, can be used to visualize and understand application behavior and correctness. This is the right environment for parallel programming.