In this podcast interview, Jack Dongarra from the University of Tennessee talks to a British radio audience about the five fastest supercomputers in the world. To illustrate the speed of the new 2.5 Petaflop Tianhe-1A system in China, he uses an analogy describing the huge Neyland Stadium in Tennessee. Even against a stadium full of a 100,000 fans with laptops, Tianhe has five times more computing power.
The analogy I like from the Nvidia press kit says that a stack of laptops with the aggregate peak power of Tianhe would be 10 times as tall as the Empire State Building.
Coming Attractions: On Nov. 3, Jack Dongarra will present Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs in a special webcast from the GPU Computing Research Forum at 10:00 Central Time.
In this Forum, I will present how to develop faster, cheaper and better linear algebra software for GPUs through a hybridization methodology that is built on (1) Representing linear algebra algorithms as directed acyclic graphs where nodes correspond to tasks and edges to dependencies among them, and (2) Scheduling the execution of the tasks over hybrid architectures of GPUs and multicore. The methodology is successfully used in MAGMA – a new generation of linear algebra libraries, similar in functionality to LAPACK, but extended for hybrid, GPU-based systems. Complex algorithms can be expressed through sequential-like code, based on computational tasks that are often already available, e.g., through optimized CPU/GPU BLAS, LAPACK, PLASMA, etc. libraries. Data-driven parallelism can then be extracted (implicitly) from the high-level description of the algorithm using run-time systems (e.g., DAGuE, StarPU, Quark, etc.) for scheduling the execution of the different tasks over the hybrid processing units. Resulting productivity is then fast and cheap, as the high-level development uses existing software infrastructure. Moreover, the resulting hybrid algorithms are better performance-wise than corresponding homogeneous algorithms designed exclusively for either GPUs or homogeneous multicore CPUs.