Sign up for our newsletter and get the latest HPC news and analysis.

UPDATE: Glenn Lockwood Looks at Intel’s Xeon Phi Uptake Measured from Real Users

Glenn LockwoodUPDATE: We pointed to this industry perspective last week, but since then, Glenn Lockwood’s post about coprocessor usage levels on Stampede has been updated with some additional data courtesy of TACC’s Bill Barth. Check out the blog comments for more, but this is story worth following.

Over at his blog, Glenn K. Lockwood from SDSC writes that the recent news about LSU receiving a $4m NSF grant for a new cluster left him wondering: are people actually using Xeon Phi processors for research, or has uptake been inflated by marketing? To find out, he looked into usage of the flagship of such machines, the Stampede supercomptuer at TACC.

Despite having over 6,000 Intel Xeon Phi coprocessors, TACC’s Stampede system is seeing extremely low utilization of these coprocessors. The vast, vast majority of users on Stampede are using that machine just as they would any other large-scale cluster: they want to run regular old MPI code on regular old CPUs. This is not entirely unexpected from those of us who are in the trenches using supercomputers and porting code on a daily basis, but I suspect a lot of people have blindly accepted the marketing from Intel that Xeon Phi is a magical product since existing MPI and OpenMP codes can run on it. As this quick workload analysis shows, there is no magic. Xeon Phi is not seeing very much adoption yet. This is a bit concerning because, of the 9.6 petaflops of performance boasted by Stampede, 7.4 of those flops (or 77%) are provided by the Xeon Phi accelerators and 2.2 are provided by the CPUs. If you take into consideration that less than 6% (on a core-hour basis) of the jobs running on Stampede use these MICs, it turns out that Stampede is grossly underutilized–somewhere close to 75% of Stampede’s deliverable FLOPs are not being delivered because users’ applications can’t use Xeon Phi. This is close to a 25% overall utilization level for the machine, which is abysmal in the world of non-accelerated clusters.

Read the Full Story.

Resource Links: