Joe Landman and the fine folks at Scalable Informatics have just published a new whitepaper [PDF] comparing the performance of the National Center for Biotechnology Information’s BLAST application on Magny Cours, Istanbul, and Nehalem.
According to the Wikipedia
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.
After running through some background on the move to multiple cores in a chip and some of the main architectural features of Magny Cours, the paper presents some performance results. Here are the machines SI used
Magny-Cours and Istanbul are installed in units supplied by AMD. Nehalem is installed in a Scalable Informatics JackRabbit high performance, tightly coupled processing and storage unit built for this specific test. CX1-* are Scalable Informatics Pegasus high performance deskside computing units that are used for testing and benchmarking purposes. Units ran 64-bit Linux , with kernels 2.6.27 for Istanbul and Nehalem, 2.6.32.7 for Magny-Cours, and 2.6.32.8 for CX1-1 and CX1-2. Istanbul and Nehalem used the SuSE Linux Enterprise 11 distribution, while the remaining machines used Centos v5.4. SMT was enabled on Nehalem, CX1-1, and CX1-2
Unlike many marketing-produced “studies” I read, this one has full details on what versions of the code were run and how they were run — good for them. You’ll want to understand those details to put the results in context, but here is a taste
At 8 processor cores, the Nehalem architecture leads in performance, though the clock speed- comparable systems, Nehalem and Istanbul, have a very similar performance. Again, if we had all the systems operating at a similar clock speed, these performance differences would appear to be significantly less. The Intel chips generally clocked faster than the AMD chips, and thus hold an instruction graduation rate advantage.
…Magny-Cours at 8 processor cores are not as fast as Istanbul at 8 processor cores. As we saw previously, this is apparently due entirely to the clock speed ratio. When renormalized to a similar clock speed, the performance difference is within estimated error bars.
Joe’s team also looked at the usefulness of Intel’s SMT versus actual physical cores, and found that while running 16 SMT cores on 8 physical cores in the Nehalem does improve performance, the improvement per core is not as great as that experienced when physical cores are used. They also observed that Magny Cours performance comes at roughly 1/3 the power usage per core when compared to the Nehalem.
I recommend you work your way through the paper; it’s relative short, but a good read.