Heterogeneous Streams with Intel Xeon Phi

Matrix multiplies can be decomposed into tiles and executed very fast on the latest generations of coprocessors. Intel has developed the hStreams library that supports task concurrency on heterogeneous platforms. The concurrency may be across nodes (Xeon, KNC, KNL-SB, KNL-LB); within a node for small matrix operations; and in the overlapping of computation and communication, particularly for tiled solutions. It relieves the user of complexity in dealing with thread affinitization, offloading, memory types, and memory affinitization.