Primer on InfiniBand for cluster buyers

Print Friendly, PDF & Email

The latest HPC Projects has an article that describes the basics of InfiniBand interconnects that anyone who is considering building a cluster — or already has an Ethernet cluster — needs to know. We seem to have a theme running through recent posts on this IB/E debate these days…

For many people configuring an HPC system, their first thought when it comes to interconnect might be to use the Ethernet ports that are standard on virtually every server and thus are essentially ‘free’, because they seemingly don’t add any cost to the system. But does this logic really hold water?

In many cases, it does not. By adding an interconnect that transfers data at much greater speeds and with lower latency, you can improve system performance to the point where applications run much more quickly, saving engineering and design time, and additional servers might not be necessary.

The article outlines the basics of the technology and various flavors of IB gear you can find today, and offers some views on where and why it is has advantages over Ethernet. If you aren’t very familiar with InfiniBand, its a good read.

In comparing InfiniBand and Ethernet, states Voltaire’s Somekh, one of the most important parameters people should look at is network efficiency; what is the impact of the network on application efficiency? This simple metric, he believes, articulates everything about this alternative approach. With large data transfers, Ethernet consumes as much as 50 per cent of the CPU cycles; the average for InfiniBand is a loss of less than 10 to 20 per cent. So, while you might not have to pay more in hardware costs to implement an Ethernet network, for HPC you will spend longer running applications to get results, which means extra development and analysis time, or you might end up purchasing extra compute nodes to provide the horsepower.

Comments

  1. You are totally confusing things. You can do IB-like protocols over Ethernet. Mellanox does this with RDMAoE and the performance is almost the same. Myricom has even been doing MXoE for 4 years now (the performance was exactly the same there since there’s no MTU problem or encapsulation overhead like RDMAoE does).
    The problem with Ethernet is not in Ethernet, it’s either in the above TCP/IP stacks, or in the fact that most Ethernet NICs do not offer advanced features such as RDMA. But Ethernet itself does not have the drawbacks you’re claming when compared to IB.

  2. Brice – You are drawing a distinction between Ethernet as a substrate and TCP/IP that lays on top of Ethernet, and yes you are right that they are different. The primer (which was written at HPC Projects, not by me, just to be clear since you used the personal pronoun in your comment) addresses the common view of Ethernet+TCP/IP, which is the view that someone coming fresh to InfiniBand and/or building one of their first clusters is likely to bring to the table. Someone who understands the distinction you are making likely wouldn’t need the primer in the first place.

  3. Brice’s point is valid (see http://blogs.cisco.com/ciscotalk/performance/comments/lies_damn_lies_and_statistics/). Beginners who read the HPCprojects article and will likely conclude “Ethernet = high latency.” That’s like saying “General Electric = Appliances” — but even though GE may be *best* known for its appliances, it has something like 12 or 15 other core businesses (jet engines, anyone?). So, too, Ethernet may be best known for TCP, but there are many other (significantly more efficient, lower latency, …etc.) transports available. They’re just a kernel module and userspace library away (no different than OpenFabrics or InfiniBand).

    IMHO: To someone who is familiar with the market, this “primer” is pretty thinly veiled marketing. To a newbie, it is structured to guide them to the opinion that IB is the solution they need.

    Just my $0.02 (and to be clear, just like I said in my blog entry — I’m just as biased as anyone else 🙂 ).

    (sorry you get the brunt of this, John — there doesn’t appear to be a way to make comments on the HPCprojects site 🙂 )

  4. Hey Jeff – that just means that you’re finding my site useful, so I’ll happily host the argument. By the way, I moved your link to your original post.

    Oh, I responded to your post on your blog, by the way. I appreciate the time that you (and Brice) took in responding, and I yield to you both.