A Special Report from: John Barr, The 451 Group
At the International Supercomputer Conference (ISC) in Hamburg earlier this year, Intel and SGI announced that they are collaborating to build an Exascale system by 2018, which is an impressive goal. However, Intel has made a number of statements about its plans for developing components to drive Exascale systems that were received with skepticism by some in the HPC community. Intel seemed to be saying, ‘Trust us, we’ll take you to Exascale,’ but without explaining how the many technical challenges were going to be addressed.
Intel builds many of the hardware and software components that will, in time, support Exascale systems, but does not build the systems themselves. SGI has announced that it will partner with Intel to build an Exascale system based on the Intel MIC chips by 2018. The main problems that SGI is working on are density, power consumption, resiliency and the communication overhead incurred when millions of threads need to talk to each other. Intel drives Moore’s law to increase the compute density, but SGI does not anticipate that this will deliver all of the improvements required by 2018, and is also working to increase the packaging density of its systems. Progress in-line with Moore’s law would deliver increased density by a factor of 25 between now and 2018, while the performance improvement required is more than a factor of 100. SGI plans to use Intel’s MIC processors to deliver the increased performance and density within a reasonable power budget, in addition to designing very compact blades that can accommodate multiple MIC processors.
There are many challenges to building an Exascale system, and programming them is also an extreme challenge. SGI has worked with many high-end customers to port their applications to GPU-accelerated architectures. While the performance improvements have often been spectacular, the effort this takes can be significant, and the skills required are scarce. SGI favors the MIC architecture, since – at least to some extent – x86 codes can be migrated to MIC without a major reworking. Intel’s compilers provide a new pragma that automatically targets OpenMP loops at MIC devices – ‘#pragma offload target (mic).’
At ISC, Intel made an Exascale commitment that it would deliver 100 times the performance of today’s high-end systems in only double the power budget, and that today’s software model would be scaled to support Exascale systems. Intel has a strong team working on compilers and software development tools, augmented by a number of acquisitions in this space (such as Cilk Arts and RapidMind). On the one hand, it is extending the capabilities of existing tools and providing a similar level of support for MIC that is available for mainstream multiprocessor, multicore systems. This is a good thing, and makes the new architecture easily accessible to programmers. At the same time, Intel is also working on advanced tools and techniques that will support scalability toward Exascale. These approaches include the Parallel Building Blocks of Cilk Plus, Threading Building Blocks and Array Building Blocks.
Some of the messaging about the programming model for Intel’s Many Integrated Core (MIC) is misleading. While it is true to say that MIC is a multicore x86 implementation, we believe that the programming model for Exascale systems must change – in response to scalability and resiliency requirements. The programming approach and tools used on today’s multicore x86 HPC systems can be applied to MIC-based systems, but going to Exascale is a whole new ballgame. We don’t believe that many of the algorithms used today will operate efficiently at Exascale, and resilience for such a large system must be handled at the application level, not at the system level (although that is a debate that is ongoing). Targeting an OpenMP code at Intel’s MIC or at an NVIDIA GPU using CUDA C/Fortran or PGI’s Accelerator Model requires code changes, and there will be very few codes of significant size that achieve a fantastic speed-up simply by adding the MIC pragma.
Talking about the Tri-Gate 3-D transistor, Intel claimed ‘infinite scalability,’ comparing 2-D transistors to a single-story building and Tri-Gate to a skyscraper. We believe that this overplays the benefits of Tri-Gate technology, which does a great job of increasing performance while decreasing power consumption – but it doesn’t deliver infinite scalability or stacked transistors.
For related stories, visit The Exascale Report Archives.