According to this report at Xbit Laboratories, Nvidia revealed details of its Exascale development project known as Echelon at SC10 last week. The company is convinced that the road to Exascale will be through heterogeneous systems, though it will require a radical and rapid evolution of GPUs in order to enable exascale performance in the 2018 – 2020 timeframe.
According to Steve Keckler, the director of architecture research at Nvidia, the Echelon design incorporates a large number (~1024) of stream cores and a smaller (~8) number of latency-optimized CPU-like cores on a single chip, sharing a common memory system. Just like in current architectures, eight stream cores will form a streaming multiprocessor (SM) and 128 of SMs will forum the large pool of throughput-optimized processing elements. Such a chip could deliver 20 teraFLOPS with double precision and a number of them will form a 2.6 petaFLOPS rack. At present Nvidia Fermi (GF110) chip 512 with stream processors operating at 1544MHz can deliver 0.79TFLOPS of DP compute performance. Considerint the 25 times difference in performance, it is highly likely that the Echelon will employ post-Maxwell (~2013 ~ 2014) Nvidia GPU design.
At present the Echelon is only a research project and these designs are not reflected on Nvidia’s roadmap.