This week Penguin Computing announced the launch of a new service called “Penguin On Demand” — POD for short. The service is targeted specifically at the needs of scientific computing users, and Penguin is positioning it against the most successful of the on-demand computing resources available today:
“The most popular cloud infrastructures today, such as Amazon EC2, are not optimized for the high performance parallel computing often required in the research and simulation sciences,” said Charles Wuischpard, CEO at Penguin Computing. “POD delivers immediate access to high density HPC computing, a resource that is difficult or impossible for many users to utilize in a timely and cost-effective way. We believe POD will promote new and faster innovation by enabling access to scalable high performance computing to a much broader market.”
One of the important factors that scientific computing people point to when assessing the relevance of on-demand infrastructures as they apply to their needs, is not surprisingly, performance. So let’s start there. Raw application performance may not always be the primary motivator, but as you’ll read in this series of posts between Ian Foster and I (here, here, and here) it always underpins ‘when’ your answer is available. Penguin’s offering is not virtualized, something that will immediately get the attention of HPC purists who are willing to eschew virtualization’s conveniences in exchange for high(er) performance.
Penguin built its on demand compute system from Xeon 5400s running Linux with 4GB/core of memory and both 1 GbE and DDR InfiniBand networks. Their rollout system was built out before the Xeon 5500 launch, and they will be upgrading it over time. Which of those you select depends upon what you need, and what your budget is: billing will vary with the computational performance of the resources you need (this is consistent with Amazon’s approach). The system uses cluster management software from Penguin, of course, via its Scyld toolset.
Josh Bernstein, Penguin’s HPC architect, sketched out the performance of the POD offering for me with two benchmarks. First, MPI ping-pongs: the POD came in at 47 microseconds latency and 20 MB/s of throughput, while EC2 measures latency of 185 microseconds and a throughput of 5 MB/s. This kind of difference is consistent with what others have reported for the network performance on EC2. Penguin has also been working with a new biomedical startup, CardioSolv, to understand the performance characteristics of their application on the POD system (according to Bernstein the application is both CPU- and communication-intensive. Results on an 8 node configuration (using Amazon’s High-CPU instances) show a runtime of 31.2 minutes on the POD and 18.5 hours on EC2, putting the POD about 35x faster than EC2 for this particular application. This is higher than the 40%-1000% range that Walker cites in his paper comparing the performance of EC2 to one of the NCSA clusters on the NAS Parallel Benchmarks.
In my conversation with Penguin, Charles Wuischpard differentiated POD from EC2 by explaining that Penguin’s variant is a “high touch” business. They want to work with scientists to understand what their needs are, help them pick the appropriate hardware to run on, and provide assistance getting the application ready to run (or even tuning it if desired). In fact, if a user wants to run one of the applications that Penguin’s POD team has experience with, they’ll actually set it up for you and send you a set of instructions and an example submission script to run it once you log in.
This may seem like a small convenience, but it’s the kind of thing that could really set them apart from EC2. Deciding that trying out EC2 “some day” is a good idea is a long way from actually finding the time and energy to figure out what size resource I need, what software and libraries I need to pack up to load on my image, and how to get it all working right. This is the kind of thing you probably only have to figure out once, but now there is a choice where users don’t have to ever figure it out (not on their own anyway). This will matter to at least some users, and the size of that group will influence the success of the business.
The cost? This is a “high touch” model, which means that they want to talk to you before they figure out what it’s going to cost. But Wuischpard would commit “that the price/performance will be much more advantageous than EC2 as a result of both competitive per core hour pricing as well as demonstrated performance increases and wall clock reductions.” Just for reference, Amazon’s high-end rate is $0.80/core-hr. He also said that this price would likely go down for users who wanted to use a lot of time.
“That the price/performance will be much more advantageous than EC2 as a result of both competitive per core hour pricing as well as demonstrated performance increases and wall clock reductions.” Just for reference, Amazon’s high-end rate is $0.80/core-hr.
What about tools and commercial software packages? Josh Bernstein explains that open source development tools are available, and the company is working on a model to include commercial software development tools (think, TotalView for example). The same kind of situation applies to COTS software: if the customer already owns a license that permits the application to be run at an off-site facility (some do, some don’t, so email your software rep to find out) then Penguin will help you get that running on their cluster. If not, you may be out of luck, for now at least. The problem is that many ISVs aren’t yet comfortable with providing users and cycle providers with licenses on terms that essentially let users lease software just for the time they require it. I think that this is not a sustainable line of business reasoning, and ISVs will ultimately have to develop a business model that includes this use case.
Will Penguin On Demand poach the company’s mainline hardware business? Wuischpard doesn’t think so. He points out that in his experience people who are buying hardware always buy as much as they can afford, and it’s rarely enough. He sees POD as a way for existing customers to augment their onsite computing capability with a surge resource that is familiar, and as a way for customers who aren’t currently in the market for a dedicated resource to still get the benefits of high performance computing into their business (or research, or whatever). Of course, cloud computing is the hot topic right now, and Penguin isn’t alone even in the much smaller area of host computing aimed specifically at scientific computing. Companies like New Servers and Nimbis Services are all in there, fighting to see which business model will resonate with customers. For users it’s an interesting time to be short on cycles.