Defining HPC Cluster Success Factors

Print Friendly, PDF & Email

This article is part of the Five Essential Strategies for Successful HPC Clusters series, which was written to help managers, administrators, and users deploy and operate successful HPC clusters.

The ability to easily change, update, or add to the cluster contributes to the overall utilization rate. This requirement is where home-brew systems can fail. Very often, the highly customized nature of home-brew systems does not tolerate change, and can cause significant downtime while updates are made by hand. A successful cluster must be able to tolerate change.

HPC Clusters BannerWhen the cluster is running, it is important to be able to monitor and maintain the system. Since clusters are built from disparate components, the management interface must handle multiple technologies from multiple vendors. Oftentimes this responsibility falls on the system administrators who must create custom (and sometimes complicated) scripts that glue together information streams coming from various points in the cluster. A successful cluster should provide tools that simplify the administrator’s workload, rather than make it more complex.

Users will request new software tools or applications. These often have library dependency chains. New compute and storage hardware will also be added over time. Administrative practices that can facilitate change without huge disruptions are essential. Home brew systems often operate on a “critical path” of software packages where changes often cause issues across a spectrum of applications. A successful cluster should accommodate user’s needs without undue downtime.

Finally, a successful cluster also minimizes the administrative costs required to deliver these success factors. The true cost of operating a successful HPC cluster extends beyond the initial hardware purchase or power budget. A truly well run and efficient cluster also minimizes the amount of time, resources, and level of expertise administrators need to detect and mitigate issues within the cluster.

Next week’s article will look at Recommendations to Managing Cluster Growth. If you prefer you can download insideHPC Guide to Successful HPC Clusters in its entirety courtesy of Bright Computing by visiting the insideHPC White Paper Library.