Optimizing GPU Performance by Mapping Memory Hierarchy

Print Friendly, PDF & Email

Patrick Viry writes that the Ateji PX approach to GPU programming makes it easy to map the GPU memory hierarchy, which is essential for achieving good performance on GPUs or any hardware with non-uniform memory accesses (NUMA).

What is important here is that memory hierarchy is expressed in terms of lexical scope. A variable from a different level in the memory hierarchy is accessible if and only if it is visible in the lexical scope. In contrast, languages such as OpenCL use specific declaration modifiers to locate variables in different memory areas. With this approach, you can for instance a variable declared in the global lexical scope, but labeled with the __private modifier. It looks like there is only one variable, while actually each kernel has its own copy. Such modifiers make it very hard to understand the logic of the code.

Read the Full Story.