Over at Dr. Dobbs, Rob Farber has posted a tutorial on using MPI to tie together thousands of Intel Xeon Phi coprocessors. Farber uses his MPI code example on the Stampede supercomputer at TACC, achieving a remarkable 2.2 Petaflops of performance when running on 3000 nodes.
This article demonstrates how to utilize Intel Xeon Phi coprocessors to evaluate a single objective function across a computational cluster using MPI. The example code can be used with existing numerical optimizations libraries to solve real problems of interest to data scientists. Performance results show that the TACC Stampede supercomputer is indeed capable of sustaining many petaflops of average effective performance. In other words, “Effective Performance” or “Honest flops” that take into account all communications overhead. Small compute clusters containing 256 nodes, which are affordable for schools and small research organizations, have the ability to exceed the peak theoretical performance of multimillion-dollar machines that are still operational at the smaller U.S. national laboratories, and deliver performance that approaches that of even large leadership-class supercomputers that are only a few years old.
Read the Full Story.