HPC architecture for beginners

A high performance computer appropriate for most small and medium-sized businesses today is built from what are basically many ordinary computers connected together with a network and centrally coordinated by some special software. Because the computers are usually physically very close together, the common term for a high performance computer that you’d used in your business today is cluster.

Because clusters are just collections of computers that are connected together with a computer network and special software to help them all work together, you’re probably already familiar with the components used to build a cluster. All the basics are still there: processors, disk, memory, and so on. The primary difference is one of scale: there’s more of everything in a cluster.

Performance Complexity

There is also more performance complexity. Consider a simple real-world problem like putting 1,000 index cards into alphabetical order (imagine a bunch of vocabulary words for a foreign language class, for example). This problem would take a long time for one person to do, but it’s not at all hard to figure out how to do it. One person simply sits down and alphabetizes the cards using knowledge she already has in her head (the order of the alphabet) and without the need to talk to or coordinate with anyone else.

If we wanted to get it done faster, we could recruit more people to help with the problem. But even adding a single new person dramatically adds to the complexity. The two people now must work together in some way: even if they each start by sorting their stack individually without speaking to one another, they’re still going to have to cooperate to get their two stacks sorted into one. And there are a bunch of ways that even that seemingly easy thing can be done.

The exact same problems exist in cluster computing. The problem of getting the processors to work together on a single problem is largely a software problem, but the software needs hardware in order to work: two processors cannot speak to one another unless they are both connected to the same network. You’ll need to make sure you have the right processors for the compute task you are running, the right amount of memory and disk, the right cluster interconnect, and so on.

All the Components Working Together

HPC architecture is more complex than simply selecting a set of components and putting them together, because all of the components interact with one another to shape the actual performance that your application gets. For example, gigabit Ethernet running TCP/IP may be easy for your team to administer and cheaper to buy, but if your application run so much slower that you have to buy more processors then you might be better off buying and InfiniBand network (these terms may not make sense to you now, but read on in the HPC 101 section for explanations of these technologies).

Try before you buy: benchmarking

The key to buying a successful cluster for your business is to test what you are buying first, with the actual applications you intend to run (or ones very similar). This process is called benchmarking, and it is critical to buying a machine that does what you need. Many vendors have either performed detailed performance studies that you can reference, or offer test clusters that can help you figure out how your application will perform before you buy. Just be sensitive to the fact that the vendor is probably going to have to help you with the benchmarking process, and this takes both time and money. Don’t waste either, and don’t start benchmarking if you are just kicking the tires. If you aren’t seriously going to buy if you find a solution that screams for your application, just read the brochures.

Read on to learn about the computers that go into a cluster