A Sneak Peak at MPI 3.0

Print Friendly, PDF & Email

Torsten Heofler has posted a preview of features coming in MPI 3.0, which is expected to release in the next 12 months.

New features include: Nonblocking Collective Operations, Neighborhood Collectives, Matched probe, MPIT Tool Interface #266, C Const Correctness, and Updated One Sided Chapter.

Updated One Sided Chapter #270.
This chapter was a good example of an excellent group effort! The new chapter offers:

  • Two memory models: one supporting cache-coherent systems (similar to many PGAS languages) and the other one is essentially the “old” MPI-2 model
  • different ordering modes for accumulate accesses (warning: the safe default mode may be easy to reason about but slower)
  • MPI_Win_allocate, a collective window creation function that allocates (potentially symmetric or specialized) memory for faster One Sided access
  • MPI_Win_create_dynamic, a mechanism to create a window that spans the whole address space together with functions to register (MPI_Win_attach) and deregister (MPI_Win_detach) memory locally
  • MPI_Get_accumulate, a fetch-and-accumulate function to atomically fetch and apply an operation to a variable
  • MPI_Fetch_and_op, a more specialized version of MPI_Get_accumulate with less parameters for atomic access to scalars only
  • MPI_Compare_and_swap, a CAS function as we know it from shared memory multiprogramming
  • MPI_R{put,get,accumulate,get_accumulate}, request-based MPI functions for local completion checking without window synchronization functions
  • MPI_Win_{un}lock_all, a function to (un)lock all processes in a window from a single process (not collective!)
  • MPI_Win_flush{_all}, a way to complete all outstanding operations to a specific target process (or all processes). Upon return of this function, the operation completed at the target (either in the private or public window copy)
  • MPI_Win_flush_local{_all}, a function to complete all operations locally to a specified process (or all processes). This does not include remote completion but local buffers can be re-used
  • conflicting accesses are now allowed but the outcome is undefined (and may corrupt the window). This is similar to the C++ memory model

While the post is designed mostly to inform users of what’s coming, Hoefler informs me that there is still some opportunity to influence the process at the last moment if something appears broken. Read the Full Story.