A few weeks ago SGI launched the successor to its workhorse ICE 8200 systems, the Altix ICE 8400. The fourth generation of the ICE offering, the 8400 is a blade-based, scale-out solution that offers a distributed memory solution based on Intel’s and AMD’s compute technology. ICE is distinguished from the forthcoming Altix UV system (and by SGI’s current generation Altix 4700 systems) by the lack of shared memory.
SGI has tried to build a lot of choice into this line by offering 5 different compute blade configurations (both Intel and AMD) and a host of storage and interconnect offerings. A big change for the 8400 is that they’ve increased the bandwidth on the backplane and now offer a ratio of 3 switch fabric ports per node port, an advantage that SGI claims over competing solutions that are closer to a 1:1 ratio. The new configuration led SGI to a new record on the SPECmpiL_2007 benchmark — the 8400 scored 51.3, outpacing the previous ICE 8200 as the record holder with 43.3.
In terms of blade choices, your five options are mostly distinguished by InfiniBand networking choices. The Intel blades can host either Nehalem or Westmere with up to 96 GB of memory. Systems can be configured with Mellanox QDR IB HCAs that are single port, dual port/single channel, or two single ports each with a dedicated PCI-e channel. Obviously your choice would depend upon the degree to which your particular workload is constrained by the cluster interconnect, but if you need lots of bandwidth or need to segregate your application and storage traffic, the last option is the way to go. Your options for the AMD Opteron 6100 blade are slightly more limited, with choices of either the dual port/single channel HCA or two single ports. The AMD blade is interesting in that it features 8 DIMMS per CPU socket, or a total of 16 DIMMS per blade (the Appro AMD solution, for example, is an 8+4 configuration).
NASA saves 2 million hours of downtime
The 8400 also features a couple interconnect topology options with both the common hypercube and fat tree topologies. In addition SGI offers an all to all option and their own unique enhanced hypercube topologies. Either hypercube topology choice has an interesting capability versus typical competitor topologies (fat tree and 3D Torus) in that SGI can enable a system to be upgraded with additional nodes without having to shut the entire system down to re-cable the network. NASA took advantage of this feature in a recent upgrade of its Pleiades system to add a new 512-core rack into its existing system while it was running a production workload. NASA Ames estimates that it saved 2 million node hours of downtime from this live integration feature. SGI stated that while this is not a standard product offering, they can certainly work with customers to enable this at other sites as well. NASA added eight additional racks worth of SGI Altix ICE 8400 gear to their system using this same approach.
Another interesting feature of the NASA installation is that they have integrated a total of 32 racks of ICE 8400 into their existing 8200 system, a nice feature if you aren’t quite ready to toss out your old super.