Managing a Hadoop Cluster

Print Friendly, PDF & Email

This article is part of the Five Essential Strategies for Successful HPC Clusters series which was written to help managers, administrators, and users deploy and operate successful HPC clusters, including managing Hadoop Cluster.

While Hadoop is not strictly an HPC application, its use in HPC is increasingly forcing systems administrators to answer the question: “Can I run Hadoop or HBase on this cluster?”  (HBase is the Hadoop version of Google’s BigTable database.)

HPC Clusters BannerManaging a Hadoop cluster is different than managing an HPC cluster. It requires mastering some new concepts, but from a management perspective, the hardware is basically the same. In one sense, a Hadoop cluster is actually simpler than most HPC configurations. Hadoop clusters use Ethernet and generally just CPUs. Hadoop is an open-source project and is often configured “by hand” using various XML files. Beyond the core components, Hadoop offers a vast array of additional components that also need management and configuration. Oftentimes Hadoop is configured as a “sub-cluster within a cluster” where a collection of nodes is configured to run the various Hadoop services. Hadoop processing is different than the typical HPC cluster as it has its own scheduler (YARN) and file system (HDFS). Integration with an existing cluster can be somewhat tedious, as a completely different set of daemons must be started on Hadoop nodes.

Instead of configuring and managing a sub-cluster or specialized Hadoop system, an automated solution such as Bright Cluster Manager can effortlessly bring Hadoop capability to a cluster. The latest version of Bright Cluster Manager (version 7) supports the leading distributions of Apache Hadoop (e.g., from the Apache Foundation, Cloudera, Hortonworks), enabling Bright’s customers to choose the one that best fits their needs while taking advantage of Bright Cluster Manager’s advanced capabilities as illustrated in Figure 5(a).

strategy5aBright Cluster Manager manages multiple instances of Hadoop HDFS simultaneously. Bright’s “Overview” tab for Hadoop illustrates essential Hadoop parameters, a key metric, as well as various Hadoop services.

Beyond installation and provisioning, Bright Cluster Manager provides comprehensive monitoring and management of the Hadoop cluster through the same graphical user interface used for HPC clusters.

Hadoop Cluster

Bright Cluster Manager collects a multitude of Hadoop-related metrics. Hadoop data node blocks, I/O and timings are illustrated here for a specific Hadoop instance (hdfs1) through Bright’s Cluster Management GUI. Note that Bright allows time series of Hadoop-specific metrics to be visualized through various styles of graphs.

 Recommendations for Hadoop Strategy

  • Plan on users requesting Hadoop or HBase capabilities in the near future. Hadoop continues to mature, and is now in use complementing HPC and other types of clusters.
  • Hadoop Clusters are configured differently than HPC clusters. Consider creating Hadoop sub-clusters in larger HPC clusters, or a separate stand-alone Hadoop cluster. Keep in mind the Hadoop sub-cluster is restricted to doing only Hadoop processing using its own workload scheduler.
  • Hadoop management is very different than HPC cluster management. There is a large number of “moving parts” (services) in a Hadoop cluster. Develop a method to easily deploy, start, stop, and manage a Hadoop cluster.
  • To avoid costly delays and configuration headaches, consider the Hadoop management capabilities in Bright Cluster Manager. Its consistent and flexible provisioning and monitoring environment will benefit administrators when they introduce Hadoop capability to their users.

A successful HPC cluster requires administrators to provision, manage, and monitor an array of hardware and software components. Currently, there are many trends in HPC clustering that include software complexity, cluster growth and scalability, system heterogeneity, Cloud computing, as well as the introduction of Hadoop services. Without a cogent strategy to address these issues, system managers and administrators can expect less-than-ideal performance and utilization. There are many component tools and best practices to be found throughout the industry. Bright Cluster Manager is the only comprehensive solution that provides a clear management path to developing a successful HPC strategy.

We hope you enjoyed the InsideHPC the Five Essential Strategies for Successful HPC Clusters article series. You can download the entire insideHPC Guide to Successful HPC Clusters, courtesy of Bright Computing, by visiting the insideHPC White Paper Library.