Microsoft’s Satnam Singh writes about the company’s new Accelerator System, which allows certain kinds of data-parallel descriptions to be written once and then executed on three different targets: GPUs, multicore processors using SSE3 vector instructions, and FPGA circuits.
In general we cannot hope to devise one language or system for programming heterogeneous systems that allows us to compile a single source into efficient implementations on wildly different computing elements such as CPUs, GPUs, and FPGAs. Such parallel-performance portability is difficult to achieve. If the problem domain is sufficiently constrained, however, it is possible to achieve good parallel performance from a single source description. Accelerator achieves this by constraining the data types used for parallel programming (to whole arrays that cannot be explicitly indexed) and by providing a restricted set of parallel array access operations (e.g., in order, in reverse, with a stride, shifted, transposed).