What kind of solution will work in my environment?

Note, if you aren’t already familiar with what HPC is or how some businesses are using it today, you might find the articles in Basic Training helpful.

For more on what the hardware elements of an HPC cluster are, see the articles in HPC 101.

The process of deciding what kind of HPC solution makes the most sense for your business starts with knowing what problem you are trying to solve, and then identifying the applications that will help you solve that problem. Knowing these two things will drive everything you do in HPC (if you haven’t already read Where does HPC Fit? you may find it helpful to read that article now).

Get started slow by borrowing time

One of the latest options to hit the street for companies wanting to adopt HPC in their business is actually an old idea made new: buying just the time you need on someone else’s computer.

Back when all computers — not just a few of the most powerful systems in the world — were room-sized monsters, almost no one could afford to have their own system. So the big companies like IBM rented time on systems that IBM kept at its headquarters to businesses that needed access. This trend has returned today with a new name, “cloud computing.”

The idea is the same: a company like Amazon, with its Elastic Compute Cloud, or Penguin Computing with its Penguin On Demand solution, or NewServers, buys, builds, and maintains a computer cluster for other people to use. Once you have an account you tell the company how many processors you need, and for how long, and then only pay for what you use. Need 16 processors for an hour next Thursday? Not a problem. With Amazon’s offering that could cost you as little as $1.60.

This option can be convenient, but it may not be right for you. You’ll need to work with your computing cycle provider to understand if the software application you need to run is available to you on their system, or if your license will permit you to run on their system. Also, if you are brand new to high performance computing, you may find that you want more hand holding than a company like Amazon is prepared to give you and so you’ll want to choose a provider who is able to work with you to meet your needs. But the good news is that once you have a workflow and know your way around the process, you’ll be able to pop in and out of your rented high performance computer as much (or little) as your business needs.

Lock, stock, and barrel

Another option is to work with a company to buy a turnkey solution, ready to plug in. All of the major computer manufacturers from Dell and IBM to SGI and Cray offer small clusters that are designed to run in your office right out of the box. No special power, no special cooling. Generally these systems are going to start around $10,000 for a relatively small but capable cluster. You’ll want to shop around a little to find a vendor who is willing to work with you in a way that you are comfortable with and who can provide the level of service you need. Computer manufacturer Cray, for example, offers a small deskside cluster that you can order with Windows or Linux pre-installed and ready to run your app. They also have a worldwide team of resellers and support specialists who can work with you from concept to installation and operation.

Do it yourself

At the other end of the spectrum is the do-it-yourself option. You can order the compute nodes (see here for an introduction to the elements of an HPC solution if you aren’t familiar with this terminology) from one vendor, the network from another, storage from a third, and installation an open source cluster kit yourself. Before there were commercial solutions for small clusters, this is how small research groups and universities did it in fact (back then they were called Beowulf clusters).

But be warned: this road is not for the faint of heart. It can be fairly easy to assemble the hardware, install an open source cluster toolkit like Rocks or Oscar, and end up with something that runs. But you may also create a nightmare of dependencies for yourself that in the end will turn your cluster into a very large boat anchor hanging around your neck. And your business probably doesn’t need another anchor.

Although small HPC clusters are much less complex than their supercomputer cousins, there are still a lot of moving parts. Are you sure the operating system you downloaded from the Internet will support the particular hardware you have installed? And not just the processors: what about the memory, the interconnect, the disks, and the filesystems? What about your applications? Which version of MPI should you install? Heck, what is MPI?

Also, what about when you patch the operating system and the whole cluster just stops working? Are you prepared to spend days finding and solving the problem, or do you want to just pick up the phone and call someone?

If you decide to go it alone there are a lot of resources to help you on the internet, and buying at least some of the parts already put together that follow an industry convention (like the Intel Cluster Ready program) can help.