PreFetch for Intel Xeon Phi – Part 2

“An interesting aspect to prefetching is the distance ahead of the data that is being used to prefetch more data. This is a critical parameter for success and can be defined as how many iterations ahead to issue a prefetch instruction, and can be referred to as the distance. A compiler will automatically determine the distance to prefetch, and can be determined by looking at the compiler optimization reports.”

Prefetching Data for Intel Xeon Phi

“Prefetching on a coprocessor such as the Intel Xeon Phi coprocessor can be more important than on a main CPU such as the Intel Xeon CPUs. Since the cores on the Intel Xeon Phi coprocessor are in-order, they cannot hide memory latency as compared to an out-of-order CPU. In addition, since a coprocessor does not have an L3 cache, L2 misses must then access the slower memory subsystem.”

Slidecast: Numascale Achieves Record STREAM Benchmark

In this slidecast, Einar Rustad from Numascale describes how the company achieved a world-record on the McCalpin STREAM benchmark using their innovative scale-out to scale-up architecture. The benchmark measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.