Over at the Parallel for All Blog, Everett Phillips and Massimiliano Fatica write that GPUs offer good acceleration on the new HPCG benchmark that has been designed to augment Linpack as a measure of performance for the TOP500. Their GPU porting strategy focused on parallelizing the Symmetric Gauss-Seidel smoother (SYMGS), which accounts for approximately two thirds of the benchmark flops.
GPU-accelerated supercomputers have proven to be very effective for accelerating compute-intensive applications like HPL, especially in terms of power efficiency. Obtaining good acceleration on the GPU for the HPCG benchmark is more challenging due to the limited parallelism and memory access patterns of the computational kernels involved. In this post we present the steps taken to obtain high performance of the HPCG benchmark on GPU-accelerated clusters, and demonstrate that our GPU-accelerated HPCG results are the fastest per-processor results reported to date.
The first HPCG list was published at ISC14 and included 15 supercomputers. Instead of looking at the peak flops of these machines, we evaluate the efficiency based on the ratio of the HPCG result to the memory bandwidth of the processors. The following table shows the results of the top 4 systems that submitted optimized results.
HPCG RANK | MACHINE NAME | HPCG GFLOP/S | #PROCS | PROCESSOR TYPE | HPCG PER PROC | BANDWIDTH PER PROC | EFFICIENCY (FLOPS/BYTE) |
---|---|---|---|---|---|---|---|
1 | Tianhe-2 | 580,109 | 46,080 | Xeon Phi-31S1P | 12.59 GF | 320 GB/s | 0.039 |
2 | K | 426,972 | 82,944 | Sparc64-viiifx | 5.15 GF | 64 GB/s | 0.080 |
3 | Titan | 322,321 | 18,648 | Tesla-K20X+ECC | 17.28 GF | 250 GB/s | 0.069 |
5 | Piz Daint | 98,979 | 5,208 | Tesla-K20X+ECC | 19.01 GF | 250 GB/s | 0.076 |
If you’d like to learn more this work on HPCG, be sure to attend Everett Phillips’ talk in the NVIDIA Booth #1727 at Supercomputing 2014 on Tuesday, November 18 at 10:30am.
Read the Full Story.
Sign up for our insideHPC Newsletter.