Accessing Global Memory Efficiently in CUDA C/C++ Kernels

January 10, 2013 by Doug Black

Over at the Nvidia Developer Zone, Mark Harris looks at how to efficiently access device memory, in particular global memory, from within kernels.

Global memory access on the device shares performance characteristics with data access on the host; namely, that data locality is very important. In early CUDA hardware, memory access alignment was as important as locality across threads, but on recent hardware alignment is not much of a concern. On the other hand, strided memory access can hurt performance, which can be alleviated using on-chip shared memory. In the next post we will explore shared memory in detail, and in the post after that we will show how to use shared memory to avoid strided global memory accesses during a matrix transpose.

Read the Full Story.

Filed Under: CPUs, GPUs, FPGAs, HPC Hardware, HPC Software, News

Energy efficiency drives HPC to the cloud

The high-performance computing (HPC) market is witnessing a notable shift towards the cloud, partially driven by the benefits of enhanced energy efficiency. According to Hyperion Research nearly every organization running HPC workloads is either already using or investigating the cloud to accelerate application performance, with the cloud market for HPC workloads forecast to reach $11.5 […]

Download

Accessing Global Memory Efficiently in CUDA C/C++ Kernels

Sponsored Guest Articles

Lenovo Maximizes HPC Resources via Partnership with SchedMD and Slurm Workload Manager

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA

Accessing Global Memory Efficiently in CUDA C/C++ Kernels

Sponsored Guest Articles

Lenovo Maximizes HPC Resources via Partnership with SchedMD and Slurm Workload Manager

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Related Posts

Featured RSS Feed

More News from insideBIGDATA