Contribution by regular readers Thomas Sterling (an HPC Rock Star) and Chirag Dekate of Louisiana State University. This article follows Sterling’s review of the past 12 months in HPC, given each year at ISC in Germany.
As the field of HPC enters its second decade of the 21st Century, new directions in system structure, operation, and programming are being driven by the technical trends and application needs at extreme scale.
Unlike never before, even with the expectation of the continuance of Moore’s law, the opportunities of performance gains are threatened by the second turning of the decades’ long S-curve HPC has been traversing. This last year has seen dramatic evidence of the initial flattening with the imposition of power and complexity constraints as well as innovative approaches and market products to address them. At ISC 2010 in Hamburg, Germany, the authors were afforded the opportunity to review the events that best reflect the trends, directions, and accomplishments of the last year by the international supercomputing community: industry, academia, and national facilities and programs. The chosen theme highlighted for this year’s presentation on the state of the field in HPC was “Igniting Exaflops” to underscore and acknowledge the major steps that have been taken over the intervening 12 months to prepare the international community for a future of Exascale computing before the end of this decade. But first, let’s summarize some of the recent key achievements in HPC and their impact as taken from the 7th annual ISC retrospective.
In brief, hex cores for multicore in 32.0 nanometer fabrication technology have become mainstream replacing last generation quad core chips for new product offerings based on commodity clusters that continue to gain market share with respect to MPPs. Sockets combing multiple dies are becoming available with up to 12 cores in cache coherence structure SMPs. Heterogeneous system structures are gaining traction with the increased integration of GPUs for floating point intensive applications.
Equally important in this direction are the advances in programming methodologies, with improved system software merging conventional APIs and CUDA or OpenCL making this emergent class of HPC systems of greater utility to the technical computing end users.
A major competition has been waged in the field of networking for clusters between Ethernet and Infiniband, with Ethernet representing the larger deployed base, but Infiniband dominating the high end systems as well as the total aggregate performance across the Top-500 list. Many applications of scientific and technical importance have been developed, pushing new discovery forward with the first significant Petaflops scale applications running on such machines as Jaguar at Oak Ridge National Laboratory and recognized by the Gordon Bell Prize. Green computing has continued to gain attention with advanced designs and techniques being applied to reduce overall energy requirements and limit the upward surge of peak power demand.
The Top500 and the race to the top
Jaguar at Oak Ridge National Laboratory is still the fastest supercomputer in the world as measured by the Linpack Benchmark (some systems are not similarly rated or reported). With a sustained performance of 1.76 delivered Petaflops (and higher on some applications), this integration of Cray XT4 and XT5 subsystems based on AMD Opterons runs SUSE Linux operating system, an array of compilers from multiple software vendors, and offer support for diverse programming models.
But a new contender for second place comes from ShenZhen, China, and with a Linpack performance of 1.27 Petaflops it handily exceeds the coveted 1 Petaflops threshold. This system uses a heterogeneous system architecture of Dawning TC3600 blades with IntelX5650 processors, and Nvidia Tesla C2050 (Fermi) GPUs. Indeed, the system’s peak performance of nearly 3 Petaflops exceeds that of Jaguar itself.
Roadrunner at LANL, the first Petaflops computer, is now entering its third year of operation using a heterogeneous architecture that incorporates IBM Cell processors with conventional AMD Opterons. Also at Oak Ridge is another Cray system, “Kraken”, that just breaks a Petaflops peak capability using dual hex core Opterons and the advanced Cray SeaStar2+ router. Germany’s Jugene IBM BG/P st Julich also exhibits Petaflops peak performance with almost 300,000 PowerPC 450 cores. China retains its Tianhe system that also peaks above a Petaflops with a cluster combing Intel Xeon and AMD GPUs. Other systems worth note are Russia’s Lomonosov and Shaheen in Saudia Arabia, with both providing hundreds of Teraflops. It should be noted that this year it was Hewlett-Packard that has deployed the largest number of HPC systems, beating out IBM for the top slot. No other supplier even comes close in this market to these two giants.
The year in cores
The foundation of all of each of these super systems is their processor cores, and this year has seen significant advances from the semiconductor component manufacturers.
Intel dominates HPC system deployment and total aggregate performance with a number of slightly different offerings. The Westmere 2-core and 6-core X5600 processors are implemented in 32 nanometer technology. The IBM Power7 architecture is in 45 nanometer, with one of the largest processor dies ever, and pushes clock speed to above 4 GHz. This 8-core package will deliver a maximum of 265 Gigaflops and incorporates advanced pre-fetching of data and instructions. It will be integrated in the Blue Waters machine to be delivered to UIUC next year. The 8- and 12-core AMD Magny-Cours processor (in 45 nanometer technology) uses HyperTransport 4 inter-core communication technology for more efficient cache coherence.
But what of Itanium? In the keynote address by Intel representatives at ISC 2010, no mention was made of its role, although it is known that a future roadmap exists with targets of Poulson in 2012 and Kittson in 2014. However, this year both Microsoft and Red Hat have announced that they will stop supporting this architecture. HP, one of the originators of much of the Itanium design, is expected to continue to deliver products based on the platform.
Also of note: Rock, Sun’s next-generation processor architecture, was terminated during the last year.
Nvidia has delivered its new GPU, Fermi, for improved double precision performance, and is making major strides in releasing improved CUDA and OpenCL software for programmer support. AMD has also advanced its ATI accelerator with the release of Cypress (RV 870) with better than half a Teraflops double precision peak performance.
Individual achievements are acknowledged. Ken Miura of Fujitsu was given the Cray Award for his work in vector computing. The Fernbach Award was to Roberto Car and Michele Parrinello for their joint method in molecular dynamics. And the inaugural Kennedy Award was presented to Francine Berman of Rensselaer Polytechnic Institute for her pioneering work in building a national grid based cyberinfrastructure in the US. William Gropp of UIUC was awarded this year’s IEEE TCSC Medal for Excellence in Scalable Computing. He was also recently elected to the US National Academy of Engineering.
With sadness we also note the passing of John Mucci, formerly of Digital Equipment Corporation and a cofounder of SiCortex in 2002.
Getting to exascale
This year also saw the inauguration of the first sponsored programs in Exascale computing.
The International Exascale Software Project (IESP) has involved participants from North America, Europe, and Asia to establish a world-wide coordinated activity to develop the software infrastructure needed in preparation for Exaflops computer architectures targeted for deployment by the end of this decade. The IESP held major technical congresses were held over the last year in France, Japan, and the UK to develop a joint international roadmap.
It is recognized by many (there is controversy on this point) that methods and means for realizing Exaflops scale computing will out of necessity prove very different from those which have successfully brought the field in to the Petaflops era. It has been well understood that historically software has always lagged behind hardware, but this time software must precede hardware both so that we will be ready to use such systems when they are developed, and to inform that development through understanding of software needs.
A second initiative that has been undertaken that will lead to technologies that can be applied to Exascale system deployment is the US DARPA UHPC (Ubiquitous High Performance Computing) program. Although not explicitly established for this purpose, UHPC will produce prototypes of Petaflops racks within the power budget of 60 Kilowatts that could be integrated into full Exascale systems by the end of this decade.
Proposals have been submitted, and DARPA should be announcing the winners before next month. This is a very exciting program with a very real prospect of reinventing how future scalable computing will be achieved.
The US DOE has also launched some new programs relevant to Exascale computing, including one to realize the goal of an X-Stack, the software infrastructure that will be required for Exaflops computing. This program has already received proposals, and will be announcing selected investigators in the near future.
Together, these and other programs begun this year, along with many technical workshops that have also been conducted within the last twelve months, are rapidly putting the world on track to aggressively and effectively move all aspects of system development forward towards the performance goals of the year 2020.
This year has been one of significant product advances, application accomplishments, and initiation of important pathfinding work. The coming year is anticipated to be even more valuable.
Dr. Thomas Sterling is a Professor of Computer Science at Louisiana State University, a Faculty Associate at California Institute of Technology, a CSRI Fellow for Sandia National Laboratories, and a Distinguished Visiting Scientist at Oak Ridge National Laboratory. He has also been recognized as an HPC Rock Star by insideHPC.
Chirag Dekate is pursuing a PhD at LSU; his topic is resource management and scheduling of dynamic data driven graph executions.