Problems with big HPC on the horizon

Print Friendly, PDF & Email

Gary Montry has an interesting article in HPCwire summarizing some of the difficulties discussed at the LCI conference on that giant HPC has looming on the horizon.

The first two keynotes set the tone by describing the perils and pitfalls of installing huge systems and getting them to perform. Even after a few years, all of the pieces don’t necessarily play together well enough to meet the original design objectives. …He [Horst Simon] noted that even though we have started construction of a petaflop computer, there are presently only two general-purpose machines in the world capable of 100+ teraflops on the Linpack benchmark.

This was a perfect segue from the opening keynote Monday evening by Robert Ballance of Sandia National Laboratory (SNL) about the difficulties of assembling Red Storm and getting it to perform. …Why? Because it was much bigger than anything they had previously built. So the old saw in computing, “if it’s 10x bigger, it is something entirely new,” still holds, and we should not expect a petaflop machine to come together quietly at this moment in HPC time.

One interesting observation which Horst made in his talk is that programming a 100,000+ core machine using MPI is akin to programming each transistor individually by hand on the old Motorola 68000 processor, which of course had only 68,000 transistors. That wasn’t so long ago to most of us, and his point is that we can’t grow too much more in complexity unless we have some new software methodology for dealing with large systems.

A problem underlying both the deployment and performance issues is the lengths that we have to go to in order to get software (both applications and OS) to work at scale. And this is a problem now being stared down by commodity developers as well.