Today Amazon CTO Werner Vogels announced on his blog that Amazon EC2 has added what it is calling Cluster Compute instances specifically to support the kinds of closely coupled workloads that traditional HPC users often run. This is an important step in growing the relevance of EC2 resources to high performance computing given the (unsurprising) benchmark results that have indicated that Amazon’s traditional highly virtualized servers underperform on these types of applications (lots of writing on this, but see here and here for examples). Vogels acknowledges this in his post
As much as Amazon EC2 and Elastic Map Reduce have been successful in freeing some HPC customers with highly parallelized workloads from the typical challenges of HPC infrastructure in capital investment and the associated heavy operation lifting, there were several classes of HPC workloads for which the existing instance types of Amazon EC2 have not been the right solution. In particular this has been true for applications based on algorithms – often MPI-based – that depend on frequent low-latency communication and/or require significant cross sectional bandwidth. Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. There has been no easy way for developers to do this in Amazon EC2… until today.
The new offering gives users the ability to get at higher performance networks and to specify exactly the hardware they need to run on (though as far as I can tell your networking options don’t include IB)
Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. Cluster Compute Instances can be grouped as cluster using a “cluster placement group” to indicate that these are instances that require low-latency, high bandwidth communication. When instances are placed in a cluster they have access to low latency, non-blocking 10 Gbps networking when communicating the other instances in the cluster.
Next, Cluster Compute Instances are specified down to the processor type so developers can squeeze optimal performance out of them using compiler architecture-specific optimizations. At launch Cluster Computer Instances for Amazon EC2 will have 2 Intel Xeon X5570 (also known as quad core i7 or Nehalem) processors.
Amazon has also issued an official press release about the new offering. NERSC has been among those exploring the use of EC2 resources for scientific computing as we reported earlier this summer, and they’ve seen positive results
“Many of our scientific research areas require high-throughput, low-latency, interconnected systems where applications can quickly communicate with each other, so we were happy to collaborate with Amazon Web Services to test drive our HPC applications on Cluster Compute Instances for Amazon EC2,” said Keith Jackson, a computer scientist at the Lawrence Berkeley National Lab. “In our series of comprehensive benchmark tests, we found our HPC applications ran 8.5 times faster on Cluster Compute Instances for Amazon EC2 than the previous EC2 instance types.”
Since NERSC was reporting slowdowns of “over a factor of 10″ (quote from Kathy Yelick in that NERSC story linked above), this puts Amazon notionally within striking distance of what you could do with your own cluster. When you factor in things like not having to have your own admins, floor space, and power and cooling, you get to an equation that starts to look like its worth seriously investigating.
There is only a single offering in the Cluster Compute product line right now; here are the specs according to Amazon’s product page
23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge
Oddly, there is a default usage limit of 8 instances (64 cores), but the web page says if you need more you can send them an email.
The press release includes a Linpack performance measurement
“For perspective, in one of our pre-production tests, an 880 server sub-cluster achieved 41.82 TFlops on a LINPACK test run – we’re very excited that Amazon EC2 customers now have access to this type of HPC performance with the low per-hour pricing, elasticity, and functionality they have come to expect from Amazon EC2.” (Peter De Santis, General Manager of Amazon EC2)
Assuming 2.93GHz processors, thats an Rmax of 41.82 TFLOPS on an Rpeak of 82.51 TFLOPS, or about 51% efficiency. For comparison, system number 162 on the Top500 is a 6400 core GigE connected Xeon 5570 (2.93 GHz) system that achieves 39.77 TFLOPS (Rpeak 75.01 TFLOPS) at an efficiency of 53%.