10 GbE ready to move into HPC in a big way

Doug Eadline over at Linux Mag predicts that its time for 10 GbE to move into HPC in a serious way

My 10 GigE prediction is based on the following rule of thumb, Speed, Simplicity, Cost, pick an two. I believe 10 GigE will win because of simplicity and cost. IB is already faster and has better latency and if you need this level of performance you are not even looking at Ethernet direction. The joy of clustering is that one size does not fit all and you can build your cluster around your needs.

More in the article, and don’t head over there thinking that Doug has positioned himself in a “10 GbE is the bestest network ever for everything so there” kind of argument. The article is a good read.


  1. I’ll step in and happily argue a bit here, but as with any argument we need to set the foundations first – one, I’m partially arguing semantics, since as Doug rightly points out, plain ol’ 1Gb connections are outdated and, most likely, insufficient to handle the types of data-intensive workloads being run these days. Does this mean 10GbE is ready to ‘move into HPC in a big way’? Well, sure, but I view it as ‘moving in’ in much the same way that a hypothetical space-ship being pulled by a black hole is ‘moving into it’ in a big way – it’s going, and fast, but as opposed to an active movement by choice, it’s being pulled because that’s all that CAN happen. (My analogies are weak today, it’s one of those weeks!)

    On the small but high-end spectrum, 10GbE can’t compete with IB, and you’ll only find it in places which tack it on due to marketing or a sales push – for example, a Dell chassis which has 10GbE uplinks, and thus a network, even if the nodes are already using DDR or QDR IB. Does this count as adoption, when nearly all relevant traffic will route over the IB stack? I’d argue no. On the lower end, 1Gb ethernet is still fine, and dirt cheap – sure, as you get faster processors your communication time will increase and 1GbE’s scalability will drop on a given number of nodes, but if a cluster is seeing decent scalability now at, say, 128 nodes with 1GbE, it seems reasonable to assume that it would stay mostly scalable even with faster nodes. Maybe more important, and I’ll admit there’s a fair amount of uncertainty here, I find that HPC adoption is outpacing HPC scaling – why does this matter? Well, a 1024-core machine with 10 users might certainly, in a year’s time, become a 2048 core machine…. but with 30 users. So how many users are doing full-system simulations, versus using the same or less relative fraction of nodes? I’d wager there are very few who continually scale up in the their runs, and instead run more ensemble-type calculations. And the ones who do scale up? Well, they go to the high-end machines.

    Finally, this leaves us with the middle-ground. Here we may see some 10GbE adoption, partly because it is easy (and being mid-ranged, price is less of a concern than the low-end folk). A lot of these purchases are made with less application-centric knowledge and more of a ‘get us a system’ mentality, and the ‘false middle’ fallacy hits people – 1Gb is too slow but cheap, and QDR IB is expensive, therefore the mid-range option, 10GbE, is ideal. And going back to Doug’s idea, he does say that the price of Ethernet switches has dropped enormously in the past, and certainly if that happens to 10GbE (as I’m sure it will over time), the same sort of adoption can happen here. Thus, coming full circle, while I started arguing with Doug on semantics, I’ll end it with specifics – a year is a long time in HPC, and I’d be surprised if 10GbE is the dominant interconnect in the Top 500 one year from now. Three years? Sure, I could see that, but not in one year.

    … Of course, I’m often wrong. :-)

    – B

    Two, Doug says, “Therefore, because I’m talking about 10 GigE does not mean I am prediction the demise of IB. More like I am predicting the demise of GigE use in clusters.” This, I can agree with.

