Intel's OpenCL Autovectorization Boosts Performance without User Intervention

Intel’s Nadav Rotem writes that the company’s newly released OpenCL SDK version 1.5 features one improvement that is very important but not always visible to the user: the new Implicit CPU Vectorization module.

What are the benefits of using the implicit CPU vectorization module? SIMD instructions expose a high-level of parallelism and are used to accelerate the performance of data-parallel applications in multiple domains. The 2nd Generation Intel® Core Processor Family codenamed “Sandy Bridge”, features the Intel® AVX instruction set, which has 8 wide floating point SIMD processing. Applications which take advantage of SIMD instructions can run as much as 8x faster. For example in Intel® AVX, the instruction “vaddps” performs an addition of 8 floating point numbers in parallel. The Implicit CPU vectorization module seamlessly compiles your OpenCL kernels to fully utilize the full 8 wide floating point SIMD processing, boosting the performance of user code without user intervention.

Read the Full Story.