If all things had gone well and as expected with the IBM “Blue Waters” contract with the National Center for Supercomputing Applications at the University of Illinois, the Power 775 supercomputer nodes and their homegrown networking infrastructure would have been the big event at the SC11 supercomputer conference in Seattle this week.
Instead Big Blue did the formal launch of the BlueGene/Q machine at the conference, and announced that the architecture of the machine can scale a lot further than many expected – as far as 100 petaflops, or one-tenth the way to the holy grail of exascale computing.
Jim Herring, product manager for HPC offerings at IBM, called BlueGene/Q the worst kept secret in supercomputing, and this is pretty much true. El Reg has told you all about the machines, which have been sighted at the SC09 and SC10 conferences, for years now, andback in August we gave you the scoop on the 18-core PowerPC A2 processor at the heart of the BlueGene/Q supercomputer.
In February 2009, IBM announced that Lawrence Livermore National Laboratory, one of the US Department of Energy’s supercomputing centers, was shelling big bucks to build a 20 petaflops machine that is now known as BlueGene/Q.
LLNL is where the first two BlueGene many-cored parallel monsters sharpened their teeth. In February this year, Argonne National Laboratory got a big chunk of change to install its own BlueGene/Q weighing in at 10 petaflops.
The BlueGene/Q machine is interesting for many reasons, and the 18-core PowerPC A2 chip is one of them. It has 16 cores for computation running at 1.6GHz, one core for running a full Linux operating system, and a spare core that is added to the chip merely to increase chip yields. This may eventually be used as a kind of hot spare in the future. (See our detailed coverage on the PowerPC A2 chip from August here.)
The 16-core chip yields 204.8 gigaflops of double-precision floating point math and only draws 55 watts. The Sequoia machine that LLNL is getting has 96 racks and will deliver 20 petaflops of peak performance in around 6.6 megawatts of power consumption.
The news bit this week is that IBM will start selling BlueGene/Q systems to other customers starting in the first quarter of 2012. Herring did not provide pricing, but said that Big Blue would charge “millions of dollars per rack.” And, as it turns out, the 5D torus interconnect that glues the BlueGene/Q nodes together actually scales to 512 racks and a whopping 100 teraflops.
Another new data point is that the BlueGene/Q interconnect runs at 40Gb/sec and has a node-to-node latency hop of 2.5 microseconds. The resulting machine burns about 80 kilowatts per rack, which is an astounding power density.
Blue Waters run deep
IBM pulled the plug on the Power 775-based Blue Waters machine in August 2011, hinting that it was too expensive to manufacture at the price that had been negotiated many years ago. (This is also why NEC and Hitachi pulled out of the K supercomputer project, leaving all the manufacturing costs and now all the glory to Fujitsu.)
But that is not the end of the Power 775 machines. Jim Herring, product manager for HPC offerings at IBM, tells El Reg that the Power 775 systems are indeed for sale and have shipped to a number of customers already.
These may not be at the sustained petaflops level that NCSA hoped to install with its Blue Waters contract, but the dense packaging, high clock speed, and fast interconnect of IBM’s Power 775 nodes will be appealing to some customers, even at the $150m per peak petaflops that IBM charges for the machines in a balanced configuration with a reasonable amount of CPU and storage.
A fully loaded balanced Power 775 cluster would have 1,365 compute nodes with just under 350,000 Power7 cores 2.7PB of main memory and 342 storage nodes with 26.3PB of disk/flash storage delivering around 10.9 petaflops of peak number-crunching performance for a list price of $1.5bn.
Why would someone pay so much money for such a beast? Well, they almost certainly don’t pay that price at volume, although IBM would never confirm what the street price is for the Power 775 machines. “One size definitely does not fit all,” explained Herring at the BlueGene/Q unveiling. “We will sell iDataPlex servers with GPUs, and BlueGene/Q, and Power 775s.”
Whither the weather?
The European Centre for Medium-Range Weather Forecasts (ECMWF) based in Reading, the United Kingdom, has a four-rack, 8,192-core Power 775 cluster that ranks at number 55 on the latest Top 500 supercomputer rankings.
Those Power7 cores spin at 3.84GHz and deliver 251.4 teraflops of peak performance and 185.1 teraflops running the Linpack Fortran matrix math benchmark test. ECMWF also has a four-rack system with half that oomph (ranked 117th on the list) and a smaller two-rack system (ranked 483rd) as well.
ECMWF has been a big user of IBM’s water-cooled Power 575 Power6+ nodes and earlier pSeries 575 modes using Power5+ chips. So a Power 775 sale here is not all that surprising. The codes just move right on over to these systems. Prior to adopting IBM’s Power-based machines in 2004, ECMWF was a big Fujitsu vector supercomputer shop.
Environment Canada, which does weather forecasting for the Great White North, has two 8,192-core Power 775 clusters as well, although for some reason they are not networked and were ranked below the UK machine at number 56 and 57 on the Top500 list. (Well, you’re only part of the Commonwealth, not part of the United Kingdom of Great Britain and Northern Ireland, so what can you expect?) And similarly, Environment Canada is a big user of prior Power 575 and pSeries 575 machines.
The United Kingdom’s Meteorological Office has similarly picked up two clusters of Power 775 machines, each with 7,680 cores and delivering 174.9 teraflops on the Linpack test; these rank numbers 62 and 63 on the Top 500. The Met has used Cray massively parallel and then NEC vector supers historically, but switched to IBM Power 575 nodes for its flagship machines in 2005.
IBM has one 6,912-core Power 775 cluster and another smaller 2,816-core cluster at its Poughkeepsie, New York laboratory, and another but these hardly counts in terms of sales.
Interestingly, number 96 on the list is a 5,504-core Power 775 cluster resold by Hitachi as the SR16000 Model M1/176 that is installed at Hokkaido University. And the Interdisciplinary Centre for Mathematical and Computational Modeling at the University of Warsaw in Poland, which has bought Cell-based BladeCenter machines for the past three years, has picked up a ten-node, 2,560-core Power 775 cluster rated at 64.3 teraflops. These are not necessarily all of the Power 775 machines that IBM has sold since they have been shipping since August, of course.
The BlueGene/Q machine is very power efficient compared to the Power 775 clusters. The half-rack BlueGene/Q machine in IBM’s lab with 8,192 cores burns 38.7 kilowatts and delivers 104.9 teraflops of peak teraflops and 65.4 of sustained teraflops of oomph on the Linpack test, for 1.7 gigaflops per watt.
Those eight-rack Power 775 clusters at ECMWF in England are rated at 251.4 peak teraflops and 185.1 sustained teraflops on Linpack, but the machine burns 501.5 kilowatts. That’s 501 megaflops per watt, or more than three times the juice per calculation. But again, the two machines have radically different architectures and are designed for very different workloads. ®