Douglas Eadline has posted one of the more comprehensive looks I’ve seen in a long time on how one would start the ball rolling to build an HPC cluster. Rightly so, he goes through the high-level architecture of a typical modern cluster as well as detailed looks at network infrastructure and most importantly, software. Unlike other attempts at a writing a Cluster 101 article, Dr. Eadline presents the various options, technologies and vendors in a very unbiased manner.
If there is one thing to remember about clusters it is they are large systems composed of many individual parts. The goal is to use all of the parts to solve a single problem (i.e. run a program that uses all the component parts at the same time). All the parts must function as intended for the system to work as a “whole.” This aspect is what separates HPC clustering from many other forms of computing — and it is what makes it so exciting.
Doc Eadline breaks down the cluster thought process down into five major points of interest.
- Clusters Are a System: Clusters are made up of three major hardware components; servers, networks and storage. The key to a successful cluster experience is configuring the three to operate in symphony.
- The Network(s) Is What Holds The Cluster Together: Without the various management and compute networks, there would be no cluster. Its essential to configure and tune your network based on your computational needs.
- Software, Software, Software: Operating systems, parallel processing methodologies, compilers, cluster management systems. They all make the cluster world go round.
- Benchmarks Rule: There is no inherent way to know that your machine is performing correctly without the use of rudimentary benchmarks.
- We Need More Than Five Topics: Clustering within the high performance computing realm has become an industry within an industry. There is a myriad of information and a grand following of users. If you don’t know the answer, ask!
All HPC greenhorns considering a cluster procurement or building their own should most definitely take notice of Dr. Eadline’s article. You can read the full article at HPCCommunity.org here.