A successful HPC cluster is a powerful asset for an organization. At the same time, these powerful racks present a multifaceted resource to manage. If not properly managed, software complexity, cluster growth, scalability, and system heterogeneity can introduce project delays and reduce the overall productivity of an organization. At the same time, cloud computing models as well as the processing of Hadoop workloads are emerging challenges that can stifle business agility if not properly implemented. The following essential strategies are guidelines for the effective operation of an HPC cluster resource:
1. Plan To Manage the Cost of Software Complexity.
Avoid creating your own cluster from scratch. An expensive expert administrative skill set is required for this approach. Consider a unified and fully integrated solution that does not
require advanced administrator skills or large amounts of time to establish a productionready HPC cluster.
2. Plan for Scalable Growth
Assume the cluster will grow in the future and make sure you can accommodate growth. Many homegrown tools do not scale, or are complicated to use as clusters grow and change. The best approach is to use a tested and unified technology that offers a low-overhead and scalable approach for managing clusters. Growth should not impact your ability to administer the cluster as change occurs.
3. Plan to Manage Heterogeneous Hardware/Software Solutions
Heterogeneous hardware is now present in virtually all clusters. Make sure you can monitor all hardware on all installed clusters in a consistent fashion. With extra work and expertise, some open source tools can be customized for this task. There are few versatile and robust tools with a single comprehensive GUI or CLI interface that can consistently manage all popular HPC hardware and software. Any monitoring solution should not interfere with HPC workloads.