Intel's OpenCL Autovectorization Boosts Performance without User Intervention

Print Friendly, PDF & Email

Intel’s Nadav Rotem writes that the company’s newly released OpenCL SDK version 1.5 features one improvement that is very important but not always visible to the user: the new Implicit CPU Vectorization module.

What are the benefits of using the implicit CPU vectorization module? SIMD instructions expose a high-level of parallelism and are used to accelerate the performance of data-parallel applications in multiple domains. The 2nd Generation Intel® Core Processor Family codenamed “Sandy Bridge”, features the Intel® AVX instruction set, which has 8 wide floating point SIMD processing. Applications which take advantage of SIMD instructions can run as much as 8x faster. For example in Intel® AVX, the instruction “vaddps” performs an addition of 8 floating point numbers in parallel. The Implicit CPU vectorization module seamlessly compiles your OpenCL kernels to fully utilize the full 8 wide floating point SIMD processing, boosting the performance of user code without user intervention.

Read the Full Story.