On my recent trip to ISC10 in Hamburg this month, the real buzz centered around the industry reaching sustained Exaflops in the next decade or so. And while this milestone is daunting enough in terms of processing, it struck me that the challenges of getting that many cores connected with low latency is going to be the highest hurdle of all. Is InfiniBand going to get us there, or will it require something way beyond current technology trends? To find out, I caught up with Brian Sparks and Clem Cole from the InfiniBand Trade Association, to talk about IBTA’s latest performance roadmap.
insideHPC: This is TOP500 week. How did IB fare in the rankings?
Brian Sparks: We did really well. InfiniBand is now deployed on 208 systems of the TOP500 sites, and that’s an increase of 37 percent from a year ago. And that success is really reflected in the upper tiers. So in the Top100, we have 64 systems. And in the Top10, we have five systems. InfiniBand has powered Top10 systems on every list since 2003, so what you’re seeing is our momentum continuing to increase.
insideHPC: Are you gaining TOP500 “share” at the expense of proprietary interconnects?
Brian Sparks: A lot of what it’s eaten has been the proprietary interconnects. If you look at all the proprietary links combined, I think it’s only 25 to 28 clusters on the TOP500. The remaining gains have come from GigE, which has gone down to the 235-240 range. There’s also a couple of 10 GigE clusters entering mix now finally.
Clem Cole: We aren’t here to bash anyone, but I think what Brian describes is correct. I’m enough of a gray-haired historian on this subject to say that the proprietary guys face a shrinking value proposition. That doesn’t mean that there isn’t room for them. I absolutely believe that for the Top5 kind of customers, there will tend to be a value for somebody to invent something that takes it to the next level. Will they do it by starting with IB, or will they start by blowing up IB like IB did with Ethernet? I don’t know. If they do go on their own, history has shown that it’s very tough to survive. The point is, I think that for the bulk of the Top500, the guys who really want a multicomputer site environment with a multi-vendor ecosystem, that’s what IB is about.
insideHPC: What are the highlights of the latest IB roadmap?
Brian Sparks: The bottom line is really that IB continues to evolve with leading bandwidth performance and other enhancements. IBTA’s current roadmap calls for 4x EDR ports at 104Gb/s data rates in 2011. That’s 26Gb/s per lane and that’s a significant uptick from IBTA’s previous roadmap from June 2008, where we projected 4x EDR at less than 80Gb/s data rate in 2011. The other performance news is that the spec is now moving from 8b/10b to a 64/66 encoding so the data throughput will now better match the link speed. So taking the new encoding into effect, EDR will be over 3X the data bandwidth that QDR now provides.
Clem Cole: I think the big message is that the performance gains are going to keep coming. So if you look at the gains we’re making from current speeds of 40Gb/s to over 100Gb/s in 2011, IB continues to be a very economical solution to one of the toughest problems in High Performance Computing.
insideHPC: Clem, you were part of the IBTA effort from the beginning. Can you tell me more about where IB came from?
Clem Cole: Well, you really need to go back to the beginning. What we think of as clustering today is really an old idea that goes back to Gerald Popek at UCLA in the early 70’s. Jerry was the first one to put his finger on this idea of taking multiple computers and orchestrating them a one big system.
So we could all see that this was the way the industry was going. And along the way, people started building these custom interconnects and we had a situation where the big vendors all had their own proprietary network technology going. At DEC we had something called Memory Channel and we were facing big development costs, like $60-$100 Million for the next generation.
Remember that this is all driven by money. So we as vendors all needed a high-bandwidth technology that was going to meet the demand, but not a one of us was getting make any money if we continued to fight each other. So InfiniBand originated from the 1999 merger of two competing designs: Future I/O, developed by Compaq, IBM, and HP, and Next Generation I/O, developed by Intel, Microsoft, and Sun.
So in the end what made InfiniBand, and for that matter Ethernet go, was everybody agreeing that this was good enough. So instead of trying to differentiate with the interconnect, we could do our own value-add in other areas.
insideHPC: How important has the Open Fabric project been to industry adoption of InfiniBand?
Clem Cole: I think OpenFabrics was one of the most important milestones in getting IB to be successful. The OpenFabrics Alliance took the OpenFabrics Enterprise Distribution upstream and helped make it part of the Linux kernel. So now you’ve got the distributions like SuSe and Red Hat where it’s just in there. And don’t forget Windows as well.
insideHPC: InfiniBand has a reputation of being much harder to implement and manage than Ethernet. Does the IBTA recognize this as an issue that needs addressing?
Brian Sparks: I think that for HPC, IB has gone through a lot of evolution in terms of ease of use. When it first came out, you had a lot of scientists who were eager to play around with things and make it work. And now as you start going into enterprise solutions, people just want to drop it in and not worry about it. So as the years have evolved, we’ve been able to make that possible.
As an organization, IBTA has been trying to address InfiniBand’s reputation as being difficult to work with. We recently came out with a eBook called Introduction to InfiniBand for End Users. It’s kind of an IB for Dummies document with some key know-how such as what does IB management look like and how does that differ from what you’re used to in terms of Ethernet management.
So InfiniBand continues to evolve, and these efforts are really important because IB isn’t just for supercomputers and hard-core scientists any more. IB lets you add a server any time you want, and for things like cloud computing that’s a great value to the enterprise as well.