Data Layout for High Performance

For maximum performance, data needs to flow into and out of the vectorization units. There are a few things to remember regarding laying out the data to gain high performance. These include, data layout, alignment, prefetching, and store operations. “Prefetching is also extremely important in HPC applications that use coprocessors. If the vectors are aligned, then the data can be streamed to the math units very efficiently, with data being prefetched, rather than the system having to load registers from various memory storage.”