Steven Koonin, the undersecretary for science in the Department of Energy, wrote an opinion piece in the SF Chronicle earlier this week responding to China’s surge in the Top500 (why doesn’t anyone mention Russia’s surge?) and taking the opportunity to sound the alarm about the need for the US to maintain HPC leadership
While its systems currently rely on U.S. components, China is already constructing comparable machines using its domestic technology. These challenges to U.S. leadership in supercomputing and chip design threaten our country’s economic future.
Koonin rather sensibly points out that we need a two-pronged approach to this leadership quest: high end hardware and a push to facilitate the broader adoption of HPC by, in particular, industry
First, we must continue to push the limits of hardware and software. Second, to remain competitive globally, U.S. industry must better capture the innovation advantage that simulation offers. But bringing such innovation to large and small firms in diverse industries requires public-private partnerships to access simulation capabilities largely resident in the national laboratories and universities.
Language of crisis: is this all we’ve got?
Excellent points with which I strongly agree. What I don’t agree with, however, is the language of crisis that Koonin is using. Phrases like “threaten our…future” and “staggering consequences” are not helpful in this context. As I’ve written before, it is a useful exercise to read the major blue ribbon HPC and IT reports that have been written over the past decades. I recently did this, all the way back to the early 1980s, and almost every single one of them uses this language.
The only thing the country can do in a crisis — run in, solve it, and run out — is throw around money.
We do not need money thrown abritrarily at this problem. While other countries, Japan twice before, and now China, occasionally might reach up and grab one of the top spots, as the visualization in this article shows the US has two advantages: nearly all of the world’s largest machines, and far more HPC capacity of all levels than anywhere else in the world. We have the hero machines for the relatively few applications that can use them, but we also have a robust, nationally distributed HPC infrastructure that surpasses not only any other country in the world, but all the rest of the world combined.
We have no moral right to this leadership position, and we will lose it if we don’t continue to invest and grow our capability. And that really would be a serious issue, far more serious than loss of the top spot itself.
We are faced with fundamental challenges to our ability to effectively use and steward the HPC capabilities we are deploying now and in the future. Outside of a few outliers, typical multicore supercomputers today realize only a few percent of their much-ballyhooed peak performance on real world applications. And we have no real idea how to fix this, or how to keep it from getting worse as chips get more cores and as our machines stretch over trans-petaFLOPS into the exaFLOPS.
Not “what can you sell me?”
These are seriously hard problems that cannot be solved by a dash in and dash out crisis response. We need decades of serious money plowed, not into acquisition, but into fundamental research into chips, memory, interconnects, and programming paradigms in partnership with a vibrant industrial computing ecosystem. The approach we need to take with our vendors, from Intel to Cray, is not “what can you sell me?” but “what can we build together?”
Why won’t this happen? Because of the DOE. And the DoD. And the NSF. And every other entity that is solving its own local point optimization problem, and working its own relationships with its own senators and representatives. Becuase there are too many rice bowls that are going to be protected when what we need, as has been pointed out by blue ribbon reports for over 30 years, is a single national strategy to advance our collective computational capability (not bigger government, no new agencies; just cooperation and coordination, with teeth).
We need a marathon, not a sprint, because what we have is a long-term problem that is far, far more serious than the “crisis” of who owns the number one slot in the Top500.