Birds do it, bees do it

Well everyone else is putting pen to paper on what they did with their time at SC, so here’s mine. I’ll skip the things that everyone else is already writing about, and jump to the few observations I have that I haven’t seen elsewhere.

During the course of the conference I had 45 meetings with the companies, programs, individuals, and agencies who are shaping the direction our community is headed in both the near and far term. In the areas of the conference I was trolling, the emphasis was decidedly short term: the 2020 exascale challenge was an undercurrent, but not an overarching theme. Dan Reed had a different experience, probably because he actually spent time in the tech program (a scheduling choice I intend not to repeat next year). On my promenades around the floor, I saw two trends.

Into pieces and parts

First, there were several companies presenting technical solutions that allow datacenter managers to break up the traditional concept of a compute node, freeing them to compose systems from pieces and parts at a finer level of granularity. Companies such as RNA networks, 3Leaf Systems, Avere Systems, and NextIO all occupy different segments of this trend. NextIO is an exemplar. Their solution allows cluster builders to deploy compute nodes with only local memory and a processor, leaving the IO — from network ports to GPUs and drives — to be deployed top-of-rack in a separate chassis that is PCI Express connected to nodes that need access to those resources. As the company says, this lets you separate decisions about IO from decisions about compute.

A use case? Rather than adding GPUs to just a few servers in your small cluster you could add them in a NextIO device that would make them available to all the servers. The point is not that all the servers would use them simultaneously (that would be bad), but rather that you wouldn’t have to decide a priori which servers would get the GPUS, allowing you to avoid idle resources held “just in case” a GPU user showed up.

The other providers in this market offer solutions that focus on memory virtualization, or combinations of memory and disk. As a whole they are beginning to gain traction with the major vendors, and most have announced partnerships with at least some of the Tier 1 OEMs.

From GPUs to accelerators

The other major technical theme I saw was the beginning generalization of the GPU. Two years ago custom accelerator company ClearSpeed was the darling of SC — it had disappeared from the conversation less than a year later, supplanted by the juggernaut that is GPGPU-based technical computing. GPU-manufacturer NVIDIA has incredible momentum right now, but its useful to remember that what goes up also may go down. An early reminder of this is that a significant fraction of the attendees and exhibitors I spoke with at SC09 are already thinking of GPGPUs as a specific case of accelerated computing, and aren’t referring to GPUs at all. NVIDIA is facing a strong first-mover disadvantage, and there is a real risk that others will come into the market with a different technology implemented in a better way. Also, there is a growing number of developer tools that support expression of parallel work that can be mapped to a variety of accelerators, including but not limited to GPUs (The Portland Group is an example, as is OpenCL). As developers adopt these tools their mobility to new technologies will be facilitated, to NVIDIA’s potential disadvantage.

NVIDIA is also facing business risk: with Intel moving to bring high performance graphics capabilities directly into the CPU, NVIDIA may find it’s volume graphics board business under siege. The primary attribute that makes NVIDIA’s GPUs attractive to HPC users is their very low price/performance. If NIVIDA loses the commodity multiplier that allows them to keep the price so low on its GPUs, the HPC community will look for other alternatives. In HPC, “good enough” is good enough, and volume always wins.