This is interesting: Timothy Prickett Morgan at The Reg is reporting on a benchmarking job that cluster maker Advanced Clustering Technologies did comparing the performance (and price performance) of two of its own two-socket servers in AMD and Intel flavors running Linpack; everything else, including the compiler, stayed the same.
The machines tested
In one corner, a Pinnacle rack server equipped with two quad-core 2.66GHz Xeon X5550s, which each have a 95 watt thermal envelope. (I would have probably chosen the 2.53GHz E5540 with an 80 watt power dissipation, which will probably be more popular for HPC clusters). This machine was equipped with 12GB of 1.33GHz DDR3 main memory.
In the other corner, another Pinnacle rack server, but one with two six-core 2.6GHz Opteron 2435s and 16 GB of 800MHz DDR2 main memory. This chip is rated at a much lower 75 watts. (Another reason why the E5540 might have been a better choice for a comparison, but any comparison has compromises.)
And the results
Here’s what happened. The Xeon 5550 box had a peak theoretical number-crunching performance of 85.12 gigaflops, and delivered 74.03 gigaflops on ACT’s Linpack run. That means the machine delivered 86.97 per cent of the theoretical performance on the actual workload. The Intel box cost around $3,800 as configured, which worked out to $51.33 per gigaflop.
With six cores running at almost the same speed, you’d expect the AMD box to do better than this, and indeed it did. The Opteron 2435 machine had a peak theoretical performance of 124.8 gigaflops, and it delivered 99.38 gigaflops on the Linpack run. While this was only a 79.63 per cent efficiency, more is more. That’s a nice right hook, and the fact that the Opteron node only costs $3,500 is a nice uppercut, yielding a much lower $35.21 per gigaflop.
The story is quick and worth a read, because there are some interesting tidbits in there (like, for example, the Intel compiler seems to do the best job on the AMD processors).
Yes, it was hexcore compared to quadcore, but then again each chip is the top end available from their manufacturers, so there is some merit in the comparison. The only quibble I have with the test is that HPL isn’t a real workload; in fact, it’s fairly unrepresentative of real work, at least in most high performance technical computing. But, it’s a start. Now, someone do the same thing with WRF and GAMESS…