As multi-socket, then multi-core systems have become the standard, the Message Passing Interface (MPI) has become one of the most popular programming models for applications that can run in parallel using many sockets and cores. Shared memory programming interfaces, such as OpenMP, have allowed developers to take advantage of systems that combine many individual servers and shared memory within the server itself. However, two different programming models have been used at the same time.
The MPI 3.0 standard allows for a new MPI interprocess shared memory extension (MPI SHM). This new feature is supported by many of the MPI distributions, including the Intel MPI Library. The MPI SHM extension enables programmers to create regions of shared memory that are accessible by the MPI processes.
While the number of cores continues to increase within a node, they all share the same local memory on that node. Sharing the data within the node is more efficient than having to use memory across a network. However, this puts pressure on the amount of memory that can be used within a node. Thus, there needs to be a balance of using node memory and that memory that is on other systems on the network.
Various benchmarks can be run to determine which method is best for a particular application, whether using MPI + OpenMP or the MPI SHM extensions. On a fairly simple test case, speedups over a base version that used point to point communication were up to 5X, depending on the message
By adding the shared memory extension to MPI, developers will have another technique at their disposal to gain maximum performance from large systems that use both the Intel Xeon and Intel Xeon Phi coprocessor. A modified MPPTEST benchmark achieved almost a 5X improvement over a standard approach. It is important to note that the benefits of MPI SHM may be more significant as both the message sizes grow and that more neighbors are involved.
Source: Intel, USA; Intel, USA