In this video, Josh Mora from AMD presents: Do Theoretical FLOPS Matter for Real Application Performance? Recorded at the HPC Advisory Council Spain Workshop 2012 in Malaga.
Do Theoretical FLOPS Matter for Real Application Performance? The most intelligent answer to this question is: “it depends on the application.” To validate it experimentally, a modified AMD processor named Fangio (AMD Opteron 6275 Processor) will be used which has limited floating-point capability to 2 FLOPs/clk/BD unit, delivering less (-8% in average) but close to the performance of AMD 6276 Processor with 4 times more floating-point capability, ie 8 FLOPs/clk/BD unit.
The intention of this work is:
- Demonstrate that the FLOPs/clk/core of microprocessor architectures isn’t necessarily a good performance metric indicator, despite its heavy use by the industry.
- To expose that code vectorization technology of compilers is fundamental in order to extract as much real application performance as possible – but it has a long way to go.
- It would not be fair to exclusively blame compiler technology; algorithms are not well designed and written for the compilers to exploit vector instructions (ie SSE, AVX, and FMA).