Whitepaper: From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

May 22, 2011 by Doug Black

This whitepaper evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPUs.

While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide a single library with decent performance on a variety of platforms. We choose triangular solver (TRSM) and matrix multiplication (GEMM) as representative level 3 BLAS routines to implement in OpenCL. We proﬁle TRSM to get the time distribution of the OpenCL runtime system. We then provide tuned GEMM kernels for both the NVIDIA Tesla C2050 and ATIRadeon 5870, the latest GPUs offered by both companies. We explore the beneﬁts of using the texture cache, the performance ramiﬁcations of copying data into images, discrepancies in the OpenCL and CUDA compilers’ optimizations, and other issues that affect the performance. Experimental results show that nearly 50% of peak performance can be obtained in GEMM on both GPUs in OpenCL. We also show that the performance of these kernels is not highly portable. Finally, we propose using auto-tuning to better explore these kernels’ parameter space using search heuristics.

Comments

Michael Wolfe says

May 27, 2011 at 4:24 pm

Title: “…Towards a Performance-portable Solution…”
Abstract: “We also show that the performance of these kernels is not highly portable.”
????
- RichB says
  
  May 28, 2011 at 8:14 am
  
  I think the Buddha once said, “You can’t move towards something if you’re already there.” Or was it the Love Guru?

Whitepaper: From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

Sponsored Guest Articles

Penguin Solutions’ Big Cluster Expertise Extends to Powerhouse Services for Big AI Deployments

White Papers

How CIOs Can Prepare Their IT Organizations and Enterprises for Generative AI

Comments

Featured RSS Feed

More News from insideAI News

Whitepaper: From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

Sponsored Guest Articles

Penguin Solutions’ Big Cluster Expertise Extends to Powerhouse Services for Big AI Deployments

White Papers

How CIOs Can Prepare Their IT Organizations and Enterprises for Generative AI

Join Us On Social Media

Comments

Featured RSS Feed

More News from insideAI News