Smarr on "the good, the bad, and the ugly" in the NSF supercomputer program

December 11, 2009 by Doug Black

I was finishing my graduate work in the early (and then the late) 1990s under Joe Thompson’s direction at the MSU/NSF ERC for Computational Field Simulation, and was privileged to work at that time through Joe’s mentorship with both Rice University and NCSA’s supercomputing programs under Ken Kennedy and Larry Smarr. All three men were very generous to me, and that time in my life (the last big era of change in HPC before the current) remains a really happy memory.

Ian Foster sent out a tweet this week with a link to a position paper Larry put together on his thoughts on what worked, and what didn’t, in the NSF supercomputer program. I recommend a read, but here are some excerpts

Increased the number of academic supercomputer users. It was estimated that before the 1985 launch of the NSF SC centers there were ~100 academic supercomputer users. After the first five years of the centers program a two orders of magnitude increase, as measured by those that logged onto one or another of the centers machines, was induced in the national academic HPC human resource pool. This vastly increased the scale of academic research using HPC and provided a pool for industry and the labs to hire from.

That’s just one of the many goods Smarr calls out, with others including stimulation of hardware diversity and incubation of the global internet.

Lack of institutionalization of the centers. In spite of constant requests from the centers, NSF never institutionalized the centers program as it had NCAR, NRAO, NOAO, etc. Those centers are, respectively, where the nation computes atmospheric sciences, observes with radio waves, and observes at optical wavelengths. The SC centers should be the sites where the academic community computes and where the staff support for things computational are housed. That is, select a few sites and give them the same multi-decadal guarantee of existence, with periodic reviews to maintain quality and user responsiveness. This would reduce a great deal of the endless rounds of existential worry and report writing which characterized the centers, at least during my 15 years as a director.

Other “bad” things he lists include a competitive culture between the centers, and narrowing the mission which drove away creative thinkers.

The ugly? There were three, but this is the greatest sin in my opinion

Lack of balanced user-to-HPC architecture. From the beginning of the NSF centers program, a basic architectural concept was building a balanced end-to-end system connecting the end user with the HPC resource. Essentially, this was what drove the NSFnet build-out and the strong adoption of NCSA Telnet, allowing end users with Macs or PCs the ability to open up multiple windows on their PCs, including the supercomputer and mass storage systems. Similarly, during the first five years of the PACI, both NPACI and the Alliance spent a lot of their software development and infrastructure developments on connecting the end-user to the HPC resources. But it seems that during the TeraGrid era, the end-users only have access to the TG resources over the shared Internet, with no local facilities for compute, storage, and visualization that scale up in proportion with the capability of the TG resources. This sets up an exponentially growing data isolation of the end users as the HPC resources get exponentially faster (thus exponentially increasing the size of data sets the end-user needs access to), while the shared Internet throughput grows slowly if at all.

Comments

Inside Tracker says

December 12, 2009 at 11:25 pm

Because of the way NSF has been organized and reorganized during the lifetime of the supercomputing centers up until the present, the control of the general direction(s) taken by the centers wavered back and forth between NSF and center leadership and among disciplines represented within NSF. The overall swing during the mid-1990s went from significant input from the scientific community as a whole toward control by a narrow group of disciplines centered on “computer science” rather than the computational sciences. At this point, the user community–as large as it had grown–was unable to maintain any single center’s systems in productive form. Everything became experimental, and center leadership became enamored of proposing and winning new toys for the computer scientists to play with. No regime of production was safe from disabling invasions of innovation that frequently stopped the science in its tracks and began instead a dance of experimentation on workflows, data management, and temperature-taking on systems through which no “heat” (scientific work) was flowing. Plain end-to-end calculation languished for all but the most determined and manaical users, generally “star” professors with graduate students to waste on minding what could be had of serious computational resources.

To be sure, the throughput and data innovations, once they were sorted out and functioning, were very useful to the scientific community. But they ought to have been developed in parallel to, rather than in place of, scientific advances. Many, many users were disaffected as actually getting into the queues and running began to take second and third place to everything else.

There is a limit to the extent to which any discipline that calls itself “X science” partakes of the scientific, whether it be “political science,” “social science,” or, sorry dear friends, “computer science.” When computer science ruled, biology, chemistry, and physics at the centers suffered. A lot of this was obscured by the constant growth in machine power, but I have no doubt that it was felt in many a lab. Other computational scientists may have more to say on this point.

Smarr on "the good, the bad, and the ugly" in the NSF supercomputer program

Trackbacks

Sponsored Guest Articles

Lenovo and NVIDIA at GTC 2024: An Alliance Enabling AI at Scale

White Papers

Energy efficiency drives HPC to the cloud

Comments

Featured RSS Feed

More News from insideBIGDATA

Smarr on "the good, the bad, and the ugly" in the NSF supercomputer program

Trackbacks

Sponsored Guest Articles

Lenovo and NVIDIA at GTC 2024: An Alliance Enabling AI at Scale

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Comments

Related Posts

Featured RSS Feed

More News from insideBIGDATA