Eadline on the True Cost of HPC Cluster Ownership

Print Friendly, PDF & Email

Douglas Eadline as written up a very objective look at a growing concern in HPC: the true cost of any one procurement.  Given the recent emergence of commodity clusters at the super scale, folks are beginning to find out first hand what the real costs are associated with a production machine.  Indeed, one can get quite the bang-for-the-buck when building a cluster.  It’s all cheap, right?

Cluster purchases are often optimized by price (most raw hardware for the lowest cost). On paper such procurements often seem impressive as the performance is often rated in terms of raw hardware cost. In practice, however, integration and associated infrastructure costs often escape the performance accounting. These costs can often increase the total cost of ownership beyond user expectations and budgets.

Support and infrastructure costs can can range from small to substantial depending on the users goal. In general, the more people that use the cluster, the higher the amount of work the end users must shoulder. Hidden costs for a cluster can be broken down into five categories; Integration, Validation, Maintenance, Upgrading, and Infrastructure. These topics will be discussed separately below.

What about integration?  I’ve seen countless organizations balk at the notion of a “fully integrated” cluster.  Indeed, any one professor may have any army of graduate students, but do they have experience in assembling and operating a production resource?  Furthermore, the increased integration time associated with local labor costs money.  It may not be funds immediately available, but it may be grants and funds associated with the next great breakthrough research [that can’t be done because the machine isn’t complete].

Because a cluster is built from multi-sourced components, the user is responsible for integration costs. These costs can be somewhat substantial and create a high maintenance cost if care is not taken when components are integrated.

All this, and no optimization, validation or ongoing maintenance has been performed.  As usual, Doug is right on the mark with his evaluation.  For all those considering a cluster purchase [small and large], I highly suggest taking a read.  You can find the article here at ClusterMonkey.