SMPs. Clusters. Accelerators. Next? Personal supers.

Joe Landman has a thoughtful post at his blog on the Next Big Thing in HPC that riffs off a post by Doug Eadline at Linux Mag. Doug talks a little bit about the history of HPC, and then asks this question

Given that the numbers of cores in a processor continues to grow (e.g the new six core processor from AMD) single memory domains (motherboards) may have anywhere between 12 and 32 cores in the near future. Here is an interesting scenario. Let’s assume that 12-32 cores systems become common place. If this is enough computing power for your tasks, then how will you approach HPC programming? Will you use MPI because you may want to scale the program to a cluster or will you use something like OpenMP or a new type of multi-core programming tool because it is easier or works better? Could a gulf in HPC programming develop?. Perhaps MPI will still be used for “big cluster HPC” and other methods may be used for “small motherboard HPC”. Of course MPI can always be used on small core counts, but will some point-and-click thread based tool attract more users because “MPI is too hard to program”.

Joe points out that he sees this with his customers, and we already see this every day in my center’s users. I run a large HPC center where we have 8,000 and 15,000 core machines and about 900 users. We are built to run big jobs, or more accurately lots of medium-sized jobs because of the dynamics of the open processor market created by backfilling job schedulers and a zeal for % utilization in the 90s. In this environment, which is also laden with all kinds of extra overhead and unfriendly security policies, anyone who can — and can afford to — run on their own gear does. This is actually a good thing from my perspective, because the dynamic is such that it tends to leave us with just the users who need the kinds of resources only a big program can provide, and those are the users that should use our resources. So every year people gather up year end money and go by 25,000 to 40,000 dollar workstations that are way more powerful than the 64 processor nCube I used to have under my desk when I first started. The don’t have to wait in line to run production jobs, don’t have to mess with weird security rule and helper applications to authenticate their credentials, and they get a much easier development environment where they don’t have to queue up to test a bug fix.

Joe also walks through a little bit of history in HPC, but his history follows disruptive trends, and his point is to arrive at a reasonable vector for the near future of HPC technology. The walk, in brief:

Twenty years ago, vector supers began to see the glimmering of a challenge from the killer supermicro’s…

Fifteen years ago, the battle was over, and supermicros had won. There were these new Pentium II systems that most in the supermicro world looked down on. I ran some tests on those, and found that the cost benefit analysis was going to favor them in the longer term. 1/3 the performance for 1/10th the price….

Ten years ago, clusters started emerging with a vengeance. …

One year ago, accelerators began their emergence in earnest.

Where does all this point?

So what I see as the up and coming generation are these personal supers. They currently offer compute power once available on small to moderate sized clusters. Back these up with a remote cluster in your machine room, or at Newservers, Amazon, Tsunamic Technologies, and you have local and remote power for your computing. The only remaining issue in the remote power is the data motion, and this is solvable if need be, with Fedex/UPS. That is, it is an eminently solvable problem, even if it is not elegant to solve.

This seems right to me. As I’ve said before I also think that eventually most of the Top500 will be hosted outside of the home organization’s datacenter. Only those for whom hosting the largest systems is part of a competitive advantage will have the will and the funding will sit next to their own machines (even if their users don’t). Those organizations for whom computing is simply a tool, and not the object of the work itself (I’m thinking of the DoD here with respect to their provision of production compute for the military R&D community, but there are certainly other great examples) will focus on providing value farther up the chain (in computational scientists for example), and leave the business of building out infrastructure and worrying about OS patches and server uptime to someone else.

Having your own power generator used to be a competitive differentiator for manufacturing plants, until the distribution of electricity became part of a national infrastructure. Once there was choice, having your own generator became a cost that businesses had to bear, and when those costs outweighed the benefits  businesses junked or sold their generators and installed plugs connected to Edison Electric. Then the thinking moved up the value chain — they had power as a given, now they got to make sure they had the right stuff to plug into those outlets so they could  run their business better than everyone else. I think the same thing with happen with most big HPC that isn’t about research in HPC itself (the machines, the operating systems, etc.). But one shouldn’t get too excited about the pace at which this change will occur. A lot of big HPC is funded directly or indirectly by federal governments, and these guys don’t change quickly.