SC16 has announced the winner of their Test of Time Award. This year the winning paper “Automatically Tuned Linear Algebra Software“ by Clint Whaley and Jack Dongarra. The paper, which has received hundreds of citations with new citations still appearing, is about ATLAS – an autotuning, optimized implementation of the Basic Linear Algebra Subprograms (BLAS).
This paper describes an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. The production of such software for machines ranging from desktop workstations to embedded processors can be a tedious and time consuming process. The work described here can help in automating much of this process. We will concentrate our efforts on the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS). In particular, the work presented here is for general matrix multiply, DGEMM. However much of the technology and approach developed here can be applied to the other Level 3 BLAS and the general strategy can have an impact on basic linear algebra operations in general and may be extended to other important kernel operations.
ATLAS has been part of the scientific software ecosystem since its inception. Despite many vendor implementations of the BLAS, Atlas remains an important performance reference point. The autotuning strategies used in ATLAS have been an inspiration to other research teams who are doing similar work.