Intel has been careful to label the Xeon Phi as a coprocessor, something that always pairs with a Xeon CPU. But how does their performance compare on real applications? Over at the Xcelerit Blog, Paul Sutton benchmarks both devices using an optimized parallel version of the Monte-Carlo LIBOR swaption portfolio pricer.
It is executed once on the host CPUs (the Sandy Bridge processors), and again on the Xeon Phi co-processor in offload mode. The execution time of the full application is measured, including data transfers, random number generation, and reduction. All these steps are running on the target processor.
As we can see, from about 100K paths onwards, the Intel Xeon Phi becomes faster than the Sandy Bridge processors, reaching nearly 3x at 1M paths. With lower numbers of paths, the Sandy Bridge outperforms the Phi. This can be explained by the added data transfers and the comparably low level of parallelism for a low number of paths (considering both vectorization and multi-threading). The setup time for the random number generator also becomes more dominant on the Xeon Phi when there is relatively little computation performed.
Read the Full Story.