We heard some very good reviews of a talk given by Doug Miles of The Portland Group at the bi-annual Clouds, Clusters and Data for Scientific Computing technical meeting outside of Lyon, France in mid- September. Most of the talks from that meeting are available online at the CCDSC 2012 website, but the PGI talk did not include any slides. PGI has provided the Exascale Report with a copy of the transcript from the talk, which we have reproduced here with a few minor edits.
Most of today’s CPU-only large-scale systems have a similar look-and-feel: many homogeneous nodes communicating via MPI, each node has a few identical processor chips, each chip has multiple identical cores, each core has some SIMD processing capability. Programming one such system is very like programming any other, regardless of chip vendor, number of total cores, number of cores per node, SIMD width or interconnect fabric.
Setting aside Accelerator-enabled systems for a minute, how did we get here? How did we reach this level of homogeneity from such heterogeneous HPC system roots? 25 or 30 years ago we had vector machines, VLIW machines, SMP machines, massively parallel SIMD machines, and literally scores of different instruction set architectures. How did systems become so homogeneous?