I was finishing my graduate work in the early (and then the late) 1990s under Joe Thompson’s direction at the MSU/NSF ERC for Computational Field Simulation, and was privileged to work at that time through Joe’s mentorship with both Rice University and NCSA’s supercomputing programs under Ken Kennedy and Larry Smarr. All three men were very generous to me, and that time in my life (the last big era of change in HPC before the current) remains a really happy memory.
Ian Foster sent out a tweet this week with a link to a position paper Larry put together on his thoughts on what worked, and what didn’t, in the NSF supercomputer program. I recommend a read, but here are some excerpts
Increased the number of academic supercomputer users. It was estimated that before the 1985 launch of the NSF SC centers there were ~100 academic supercomputer users. After the first five years of the centers program a two orders of magnitude increase, as measured by those that logged onto one or another of the centers machines, was induced in the national academic HPC human resource pool. This vastly increased the scale of academic research using HPC and provided a pool for industry and the labs to hire from.
That’s just one of the many goods Smarr calls out, with others including stimulation of hardware diversity and incubation of the global internet.
Lack of institutionalization of the centers. In spite of constant requests from the centers, NSF never institutionalized the centers program as it had NCAR, NRAO, NOAO, etc. Those centers are, respectively, where the nation computes atmospheric sciences, observes with radio waves, and observes at optical wavelengths. The SC centers should be the sites where the academic community computes and where the staff support for things computational are housed. That is, select a few sites and give them the same multi-decadal guarantee of existence, with periodic reviews to maintain quality and user responsiveness. This would reduce a great deal of the endless rounds of existential worry and report writing which characterized the centers, at least during my 15 years as a director.
Other “bad” things he lists include a competitive culture between the centers, and narrowing the mission which drove away creative thinkers.
The ugly? There were three, but this is the greatest sin in my opinion
Lack of balanced user-to-HPC architecture. From the beginning of the NSF centers program, a basic architectural concept was building a balanced end-to-end system connecting the end user with the HPC resource. Essentially, this was what drove the NSFnet build-out and the strong adoption of NCSA Telnet, allowing end users with Macs or PCs the ability to open up multiple windows on their PCs, including the supercomputer and mass storage systems. Similarly, during the first five years of the PACI, both NPACI and the Alliance spent a lot of their software development and infrastructure developments on connecting the end-user to the HPC resources. But it seems that during the TeraGrid era, the end-users only have access to the TG resources over the shared Internet, with no local facilities for compute, storage, and visualization that scale up in proportion with the capability of the TG resources. This sets up an exponentially growing data isolation of the end users as the HPC resources get exponentially faster (thus exponentially increasing the size of data sets the end-user needs access to), while the shared Internet throughput grows slowly if at all.