Comparing a traditional cluster with Amazon's EC2 on the NAS benchmarks and Linpack

Print Friendly, PDF & Email

I’ve had this paper on my desk to read for nearly a year now. It’s from Oct 2008, and it’s called “Benchmarking Amazon EC2 for High Performance Scientific Computing” by Edward Walker. It’s an interesting, quick read that compares performance results from the NAS benchmarks on one of NCSA’s clusters and Amazon EC2; both clusters used dual-socket, quad-core 2.33-GHz Intel Xeon processors.

Instead, this article describes my results in using macro and micro benchmarks to examine the “delta” between clusters composed of currently available state-of-the-art CPUs from Amazon EC2 versus clusters available to the HPC scientific community circa 2008. My results were obtained by using the NAS Parallel Benchmarks to measure the performance of these clusters for frequently occurring scientific calculations. Also, since the Message-Passing Interface (MPI) library is an important programming tool used widely in scientific computing, my results demonstrate the MPI performance in these clusters by using the mpptest micro benchmark.

The results? EC2 always underperforms Abe, even when the work stays within a single node of both clusters. They are closest for the OpenMP versions of the NPB on a single node of each cluster, with results on EC2 lower by 7 to 21%. On MPI versions of the NPB run on 32 CPUs, the results for EC2 are between 40 and 1000% percent worse, and this is true even when the nodes don’t communicate

Figure 2 shows the run times of the benchmark programs. From the results, we see approximately 40%–1000% performance degradation in the EC2 runs compared to the NCSA runs. Greater then 200% performance degradation is seen in the programs CG, FT, IS, IU, and MG. Surprisingly, even EP (embarrassingly parallel), where no message-passing communication is performed during the computation and only a global reduction is performed at the end, exhibits approximately 50% performance degradation in the EC2 run.

The mpptest bisection results show that EC2’s interconnect performs about an order of magnitude worse than Abe’s InfiniBand network for both latency and bandwidth. From the paper’s conclusions

The opportunity of using commercial cloud computing services for HPC is compelling. It unburdens the large majority of computational scientists from maintaining permanent cluster fixtures, and it encourages free open-market competition, allowing researchers to pick the best service based on the price they are willing to pay. However, the delivery of HPC performance with commercial cloud computing services such as Amazon EC2 is not yet mature. This article has shown that a performance gap exists between performing HPC computations on a traditional scientific cluster and on an EC2 provisioned scientific cluster. This performance gap is seen not only in the MPI performance of distributed-memory parallel programs but also in the single compute node OpenMP performance for shared-memory parallel programs.

These are very much the same conclusions reached by another set of researchers in a paper published in February of this year using Linpack, “Can Cloud Computing Reach the Top500?”

In this paper we investigate the use of cloud computing for highperformance numerical applications. In particular, we assume unlimited monetary resources to answer the question, “How high can a cloud computing service get in the TOP500 list?” We show results for the Linpack benchmark on different allocations on Amazon EC2.

…While cloud computing provides an extensible and powerful computing environment for web services, our experiments indicate that the cloud (or Amazon’s EC2, at least) is not yet mature enough for HPC computations. We observe that the GFLOP/sec obtained per dollar spent decrease exponentially with increasing computing cores and correspondingly, the cost for solving a linear system increases exponentially with the problem size—very much in constrast to existing scalable HPC systems.

Suggestions for moving to higher levels of performance for our community?

If cloud computing vendors are serious about targeting the HPC market, different cost models must be explored. An obvious first step would be to offer better interconnects or nodes provisioned with more physical memory to overcome the slower network.

Of course, something to keep in mind with all of this is that it’s only useful to compare the performance of a cloud solution to a dedicated HPC solution when a user has such a choice. If there is no other alternative, EC2 is still infinitely better than nothing.

Trackbacks

  1. […] be the primary motivator, but as you’ll read in this series of posts between Ian Foster and I (here, here, and here) it always underpins ‘when’ your answer is available. Penguin’s offering is […]

  2. […] designed specifically for this kind of work (like Penguin’s POD offering; there is already mounting evidence of the inappropriateness of general and highly-virtualized cloud solutions like Amazon’s EC2 […]

  3. […] the solution is build on Amazon’s EC2 it has all of the limitations of that platform for certain kinds of scientific computing — but if you’re already using EC2 the hard […]

  4. […] servers underperform on these types of applications (lots of writing on this, but see here and here for examples). Vogels acknowledges this in his post As much as Amazon EC2 and Elastic Map Reduce […]

Comments

  1. Ian Foster says

    John:

    I really liked Ed Walker’s article. Of course, it is really just a commentary on the specific hardware and software configurations used by Amazon and NCSA. But certainly a useful reality check.

    I’d like to add another complementary perspective. As Ed shows, the NAS benchmarks execute faster on Abe than on EC2. However, what if one simply wants to run a NAS benchmark as soon as possible? In that case, the relevant metric is elapsed time from submission to the completion of execution.

    Let’s say we want to run the LU benchmark, which takes ~20 secs on Abe and ~100 secs on EC2. Now let’s add in queue and startup time.

    On EC2, it takes ~5 minutes to startup 32 nodes (depending on image size), so with high probability we will finish the LU benchmark within 100 + 300 = 400 secs.

    On Abe, we can use the QBETS queue time estimation service to get a bound on the queue time. When I tried this in June, I was told that if I want 32 nodes for 20 seconds, then:
    — With 25% chance we can get them within 100 secs,
    — With 50 % chance within 1,000 secs
    — With 85% chance within 10,000 secs

    So, if I had to bet, I would have to go for EC2 as the “fastest” place to run the NAS LU benchmark!

    Of course this result reflects not the performance of the Abe vs. EC2 schedulers, but the specific scheduling policies (and loads) that they are subject to. Nevertheless, it does provide another useful perspective on the relative merits of commercial infrastructure-as-a-service providers vs. supercomputer centers.

    For more on this, see: http://www.slideshare.net/ianfoster/computing-outside-the-box-june-2009.

    Regards — Ian.

  2. After Ian turned his great comment into a post on his own blog about job start times, I took his example a little further and looked at total job run times for jobs representative of my own users, http://insidehpc.com/2009/08/05/queue-wait-time-assessing-cloud-hpc-performance/.

    As a meta comment, this is one of those rare instances in HPC blogging when blogs actually get used for a conversation, not just for broadcasts. Cool.

  3. There was also the work by Martin Sevior, Tom Fifield and Nobu Katayama on “Belle Monte-Carlo production on the Amazon EC2 cloud” where they ran benchmark runs of the full simulation chain to see at what point EC2 overtook the cost of owning a cluster.

  4. gordon bell says

    Ian makes a nice point and his slides provide more incite about cloud services. Bothering to compare EC2 or any Commercial service with public centers is moot because “State Computers” are “Free” with a cost/operation of zero creating a cloud services market of zero. Until there’s a market for scientific computing, commercial services will do just fine handling data and not have to worry about MPI, fast interconnects or benchmarks. Anyone know how much “Free” costs?

  5. John West says

    Gordon – Thanks for reading and commenting! I think your question was probably rhetorical, but not everyone who reads here will know that “free” is about $16M a year, excluding acquisition costs, for a moderately-sized “state” computing center. I hadn’t thought about it exactly that way before — from the perspective if state-sponsored computing killing any market for commercial computing solutions suitable for scientific computation. I had thought of them as parallel “markets”, with the state supporting an area of endeavor separate from the commercial endeavors. My belief is that ultimately the state will have to move out of its own datacenters, or pay much more than it is already, if we are to continue to sponsor effective large scale scientific computing. To my mind this shift out of our own datacenters would be a good thing, as I think taxpayer money would be much better spent investing in the how of parallel software, not the transformers, cooling, wiring, and floor tiles.