Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Radio Free HPC Looks at Sunway TaihuLight – the World’s Fastest Supercomputer

0:00

 
bubble4In this podcast, Shahin Khan from OrionX joins the Radio Free HPC team for a look at the new TOP500 list of the world’s fastest supercomputers.

As announced this morning at ISC 2016, the fastest supercomputer on the TOP500 is new Sunway TaihuLight System in Wuxi, China. Developed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC), TaihuLight scored a whopping 93 Petaflops on the LINPACK benchmark. To put that in perspective, that is nearly three times faster than the previous #1 system, the Tianhe-2 supercomputer, which has moved to #2 after ruling the roost for some three years or so TaihuLight is also five times faster than Titan, the 17 Petaflop machine at ORNL, which is still the fastest machine in the USA.

Sunway in Wuxi, China

The Sunway TaihuLight Suercomputer in Wuxi, China is the world’s fastest supercomputer.

Now before people start to panic that the US is falling way behind in the supercomputing race, I think its important to take a look at what TaihuLight is and what it is not. China is calling this a domestically designed supercomputer. In other words, they built it in-house and it does not use processor or accelerator technology from US companies like Intel and Nvidia.

Here is the rundown:

  • Linpack: 93 Petaflops (Rmax)
  • Peak performance: 125.4 Petflops (Rpeak)
  • Processor: Sunway SW26010 1.4 GHz processor
  • Cores per socket: 260
  • Instruction Set: RISC instruction set developed by Sunway
  • Interconnect: their TOP500 submission says “Sunway design” but Mellanox supplied the Host Channel Adapter (HCA) and switch chips. Sunway may not call it InfiniBand, but that is exactly what it is. China has political reasons for characterizing the overall system domestic technology.
  • Cabinets: 40 Water-cooled cabinets, each with 3 Petaflops of peak performance
  • Power consumption: 15.27 Megawatts
  • Mflops/watt: 6051

chipNow, the first reaction you are likely going to hear from the US sector is that is a stunt system. Along those lines, it is an unbalanced, floating-point heavy architecture that has no cache and not a whole of memory per core. I’ve heard it compared to Blue Gene L.

But wait, there are three Gordon Bell submissions based on the new Sunway TaihuLight system.

These three applications are: (1) a fully-implicit nonhydrostatic dynamic solver for cloud-resolving atmospheric simulation; (2) a highly effective global surface wave numerical simulation with ultra-high resolution; (3) large scale phase-field simulation for coarsening dynamics based on Cahn-Hilliard equation with degenerated mobility.

All these three applications have scaled to around 8 million cores (close to the full system scale). The applications that come with an explicit method (such as wave simulation and phase-field simulation) have achieved a sustained performance of 30 to 40 PFlops. In contrast, the implicit solver achieves a sustained performance of around 1.5 PFlops, with a good convergence rate for large-scale problems. These performance number may be improved before the SC16 Conference in November 2016.

Before you go ahead and label Sunway TaihuLight just a Linpack box, consider this from Jack Dongarra’s paper:

There are three submissions which are finalists for the Gordon Bell Award at SC16 that are based on the new Sunway TaihuLight system. These three applications are: (1) a fully-implicit nonhydrostatic dynamic solver for cloud-resolving atmospheric simulation; (2) a highly effective global surface wave numerical simulation with ultra-high resolution; (3) large scale phase-field simulation for coarsening dynamics based on Cahn-Hilliard equation with degenerated mobility. All these three applications have scaled to around 8 million cores (close to the full system scale). The applications that come with an explicit method (such as wave simulation and phase-field simulation) have achieved a sustained performance of 30 to 40 PFlops. In contrast, the implicit solver achieves a sustained performance of around 1.5 PFlops, with a good convergence rate for large-scale problems. These performance number may be improved before the SC16 Conference in November 2016.

sunwayAt this writing, Sunway says that the system is in full operation with a number of applications implemented and running in production. The Center will be a “public supercomputing center” that provides services for public users in China and abroad.

architecture

We’re looking forward to learning more about the Sunway TaihuLight System, but one thing is for sure; this is going to be the fastest machine in the world for years to come.

This is not a one-time effort from China. Not only do they now have the two top two supercomputers, China also sponsors the world’s largest state-sponsored Student Cluster Competition with over 170 university teams. The takeaway from today; China is serious about supercomputing, they are in it for the long haul, and they are willing to write the checks to make it happen.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Resource Links: