Today Nvidia announced details of Titan, a 20 Petaflop (peak) heterogeneous supercomputer to be deployed at Oak Ridge National Laboratory sometime in 2012. Powered by a Cray XK6 system with 18,000 NVIDIA Tesla GPUs, Titan will reportedly be more than two times faster and three times more energy efficient than today’s fastest supercomputer, the K computer located in Japan.
All areas of science can benefit from this substantial increase in computing power, opening the doors for new discoveries that so far have been out of reach,” said Jeff Nichols, Associate Laboratory Director for Computing and Computational Sciences at Oak Ridge National Laboratory. “Titan will be used for a variety of important research projects, including the development of more commercially viable biofuels, cleaner burning engines, safer nuclear energy, and more efficient solar power.”
Yesterday I spoke to Steve Scott, CTO of Tesla products at NVIDIA about the announcement and his recent move to Nvidia from a similar role at Cray. There were really three main take-aways:
- Energy constraints driving technology. In Scott’s words, the world has changed. Power constraints make previous architecture impractical at this scale. Titan will have roughly the same power consumption as Jaguar (7-10 Mw), but will be 10x more powerful.
- Heterogeneous validation. ORNL serves a wide variety of research disciplines. Heterogeneous systems have demonstrated significant performance gains in a range of codes used by researchers served by the Lab.
- Viable HPC ecosystems. Not only is Nvidia’s unique technology right for the needs of supercomputing at extreme scale, what attracted Scott to the company was that they’ve also got the right business model (supported by high-volume graphics cards) to succeed.
Oak Ridge’s decision to base Titan on Tesla GPUs underscores the growing belief that GPU-based heterogeneous computing is the best approach to reach exascale computing levels within the next decade,” said Steve Scott, chief technology officer of Tesla products at NVIDIA, referring to computing performance levels of 1,000 petaflops. “The Tesla GPUs will provide over 85 percent of the peak performance of Titan. You simply can’t get this level of performance in a power- and cost-efficient way with CPUs alone.”
Titan first stage of deployment is currently underway. ORNL will upgrade its existing Jaguar supercomputer with 960 Tesla M2090 GPUs based on the “Fermi” architecture. In phase 2, these GPUs will serve as companion processors to multi-core CPUs in this Cray XK6 supercomputer. In the second phase, expected to begin in 2012, Oak Ridge plans to deploy up to 18,000 Tesla GPUs based on the next-generation architecture, code-named “Kepler.”
The K supercomputer has 10.725 PF peak and 8.126 LINPACK Rmax for 9.89 MW, which is 1.084 GF(Peak)/W or 0.825 GF(Rmax)/W (http://www.top500.org/system/performance/10810).
Cray’s own slides say the XK6 will achieve ~1.2 GF(Rmax)/W (http://www.ena-hpc.org/talks/oed-slides.pdf). Public data on the NVIDIA X2090 and AMD 6200 suggest 2.5 GF(Rmax)/W or >3.25 GF(Peak)/W. Either Cray is off by more than a factor of two in their GF(Rmax)/W predictions or they are going to do more than 50% better than the best possible GF(Peak)/W one can derive from available data.
Furthermore, to do 20 PF peak at >3.25 FW(Peak)/W means Titan will use less than 6.15 MW, which is inconsistent with the claim above that Jaguar does 7-10 MW and Titan will be similar.
In conclusion, the numbers in this article just don’t add up.
Jeff, I believe the power consumption numbers are for the overall system including storage. Peak numbers are based on future Kepplar GPUs, which reportedly have a similar power envelope to today’s devices.