Mark Harris on Using Shared Memory in CUDA C/C++

January 29, 2013 by Doug Black

Over at the Parallel for All blog, Mark Harris writes that Shared memory is a powerful feature for writing well optimized CUDA code. Access to shared memory is much faster than global memory access because it is located on chip.

Because shared memory is shared by threads in a thread block, it provides a mechanism for threads to cooperate. One way to use shared memory that leverages such thread cooperation is to enable global memory coalescing, as demonstrated by the array reversal in this post. By reversing the array using shared memory we are able to have all global memory reads and writes performed with unit stride, achieving full coalescing on any CUDA GPU.

Read the Full Story.

Filed Under: CPUs, GPUs, FPGAs, HPC Hardware, HPC Software, News

Energy efficiency drives HPC to the cloud

The high-performance computing (HPC) market is witnessing a notable shift towards the cloud, partially driven by the benefits of enhanced energy efficiency. According to Hyperion Research nearly every organization running HPC workloads is either already using or investigating the cloud to accelerate application performance, with the cloud market for HPC workloads forecast to reach $11.5 […]

Download

Mark Harris on Using Shared Memory in CUDA C/C++

Sponsored Guest Articles

Lenovo and NVIDIA at GTC 2024: An Alliance Enabling AI at Scale

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA

Mark Harris on Using Shared Memory in CUDA C/C++

Sponsored Guest Articles

Lenovo and NVIDIA at GTC 2024: An Alliance Enabling AI at Scale

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Related Posts

Featured RSS Feed

More News from insideBIGDATA