New Whitepaper Compares HPC App Performance on 10GbE vs IB

Cisco has published a new whitepaper that compares HPC application performance on 10 Gig Ethernet vs. InfiniBand.

Despite the high bandwidth and performance claims of InfiniBand, Cisco has demonstrated that 10 Gigabit Ethernet is a viable solution for HPC clusters. The powerful Cisco C200 M2 server combined with Cisco Nexus 5000 Series Switches and RDMA NICs provides a low-latency solution that meets or beats QDR InfiniBand in real application performance for leading HPC applications.

So readers, what do you think? Is this a fair comparison? Download the whitepaper (PDF).

Comments

  1. funny…why weren’t the IB adapters tested on the same UCS servers?

  2. John Benninghoff says

    We did not have a QDR switch available to us.
    We did use the same CPUs as the published results and presumed our IB results could be questioned especially if we did not equal the published IB results.

  3. The results in the LS-DYNA Car2car test are….. let’s say “not fair”. For one node Cisco’s cluster outperform the other cluster. However, for one node the network technology won’t make any difference, since it won’t be used. That shows that the systems are not equivalent, and comparing their network is therefore like compare apples with oranges.

    These results, if something, prove that QDR InfiniBand is not worse, since the more nodes the less performance difference, even though the proportion seems to remain stable. In other words, the network performance does not add any penalty to the worse performance in the single node case. But this problem looks embarrassingly parallel, so the network has little influence, and therefore proves close to nothing about the network.

    Moreover, the topcrunch.org site shows other systems with the same processor and QDR InfiniBand. For instance, the Altix XE1300 completes the benchmark in 4005 seconds (32 nodes), vs 4567 of the SR1600UR system. That is better than the Cisco cluster.

    I can not check the numbers of the other tests, but as long as there are more differences than the network, I’ll be skeptical in the true capabilities of 10GbE vs InfiniBand (even though 10GbE is an impressive step forward for ethernet).

    Moreover, there is no data on how these applications rely on the network. The first and the third look embarrassingly parallel, so almost not network usage. And in the third application the InfiniBand cluster performs better than the Cisco cluster in all cases except the last one, which pretty difficult to explain. Are these measures averaged, the best measures, or just single measures? Many open questions.

  4. so this report isn’t accurate to say the least. Can we get a real comparison please?

  5. John Benninghoff says

    The car2car model is definitely not EP. You can verify with LSTC and topcrunch.org.
    Neither are the Fluent benchmarks. The details of the benchmarks are provided by the ISVs and results have been submitted by numerous OEMs over the years. Not sure why the benchmarks would suddenly become suspect. ??

    The Altix results at topcrunch.org are most likely done with a dual-rail configuration which is why they exceed the Intel results with the same QDR IB and same CPUs.

    The single node Fluent results are within ~2% which is probably within run to run variance.
    Not all x86 compute nodes are exactly equal even when the CPUs are at the same frequency.

    The main point is not that 10GE is dramatically better than IB (it is not) but rather that it is surprisingly equal to IB performance on well-known MPI application benchmarks.

  6. They might not be EP, but to achieve this scaling the ratio computation/communication has to be high (or able to overlap), which means that communication has little impact in the overall run, unless the communication layer is extremely bad. Since both 10GbE and IB are good networks, their influence in the result is not extremely important.

    It is not that the benchmark became suspect. The question for me is that the comparison is not rigorous enough. As you say, not all the x86 (or any other architecture) nodes are equal, and having a fair comparison with different hardware, different software and different tuning is pretty difficult, specially if you based your conclusions on a single run. For instance: the single node results for fluent are within 2%. But the single node results for eclipse are more towards 11%, then for 2 nodes they shrink a little bit to around 9%, but then for 4 nodes the difference is around 50%. I can not find any plausible explanation for that. Is 10GbE that good that even for just 4 nodes it make such a difference? Is QDR IB that bad that can not match 10GbE scalability even at this small level? I highly doubt it.

    The Altix results might be obtained with dual-rail, and might be not. That’s why I have said that to get a fair comparison between different networks it is necessary to test the same hardware and software (besides the hardware and software that you intend to compare).

    Of course it is a white paper, it has not to be peer-reviewed or anything like that. I am just saying that the comparison could be more fair.

    However I agree on your main point. 10GbE is a huge step forward for ethernet. And 10GbE is surprisingly equal to IB. On these tests.

  7. That’s nice but there is no cost comparison. “Almost as good as” Infiniband for half the price, that would be a great news. If the price is the same or more, I don’t really see the point.

  8. This whole paper is a big joke. Trying to “claim” that 10Gb/s Ethernet is faster than 40Gb/s InfiniBand (and you have FDR IB today with 56Gb/s) is something I would expect to see from “who cares” startups, but not from Cisco. Dear Cisco – go back to first grade and learn that 10G<40G. Just your switch latency is 3 time the InfiniBand server to server latency…

    10G Eth is not even close to what IB can provide. I tried it. Waste of time and money. If Cisco wants to get back to HPC, I would recommend them to sell IB.

  9. Agree on the joke… seems that with Cisco networking you get higher performance starting with a single server….. magic….