The recently published Department of Energy FY 2018 Congressional Budget Request has raised a lot of questions about the Aurora supercomputer that was scheduled to be deployed at Argonne ALCF next year. In fact, the massive 13 Megawatt machine was suddenly missing from the budget entirely.
We asked Intel about this when it first came to light in May, and they had no comment. At that time, Rick Borchelt from the DoE provided this official statement:
“On the record, Aurora contract is not cancelled.”
Not cancelled. Just missing.
As we covered in our Radio Free HPC podcast, Aurora appears to be morphing into a very different kind of machine. The budget document actually gave us the clue:
The ALCF upgrade project will shift toward an advanced architecture, particularly well-suited for machine learning applications capable of more than an exaflop performance when delivered. This will impact site preparations and requires significant new non-recurring engineering efforts with the vendor to develop features that meet ECI requirements and that are architecturally diverse from the OLCF exascale system.
This week, Paul Messina from the Exascale Computing Project confirmed what we thought all along:
insideHPC: What is the status of the Aurora system that was supposed to come to Argonne?
Paul Messina: I believe that the Aurora system contract is being reviewed for potential changes that would result in a subsequent system in a different time frame from the original Aurora system. But since that’s just early negotiations, I don’t think we can be anymore specific on that.”
Just watch; Aurora will morph into the “novel architecture” U.S. exascale machine that the ECP project now plans to deploy in 2021 at Argonne. Targeted at AI, the machine will likely be tailored for workloads that do not require 64-bit precision. That means no Linpack run, folks.
We’ll keep you posted as this story unfolds.
Sign up for our insideHPC Newsletter
I wonder if they are giving Intel more time to add the Knights Mill AVX-512 machine learning enhancements to Knights Hill and for the 10nm node to be ready for large, high-powered dies. Such a system architecture could be called an “advanced architecture, particularly well-suited for machine learning applications capable of more than an exaflop performance when delivered” that is “architecturally diverse from the OLCF exascale system”, as long as they boost the double precision peak throughput to more than 250 petaFLOPS.
I guess another possibility is a Lake Crest/Knights Crest system where the x86 cores are used for simulation and the Nervana Engine is used for the machine learning. I just don’t see them moving away from Intel, particularly because they want something “architecturally diverse from the OLCF exascale system” and they want something that runs both traditional HPC code and can accelerate machine learning.