Note: If you aren’t already familiar with what HPC is or how some businesses are using it today, you might find the articles in Basic Training helpful.
Once you’ve gotten your head around what HPC is you’ll probably start thinking about where it might fit in your business. It all starts with identifying the problem you want to solve, and the software you’ll need to solve it.
What’s the problem?
There are almost as many uses for HPC in solving problems as there are problems to be solved. Looking to optimize the flow and placement of merchandise in your retail stores? HPC can help. Need to test 10 bridge designs in the amount of time it usually takes just 1 to finish? HPC can help there, too. Want to give your clients an investment strategy tailored for the current market and their risk tolerance, but don’t have the horsepower to hindcast more than ten years of market data? HPC is there to help you, too.
The key to knowing whether an HPC investment makes sense for you is knowing what you want to accomplish. We are big believers in the power of HPC and supercomputing — we’ve spent our whole careers doing it! But buying an HPC cluster for your business with no clear idea of what you want from it doesn’t make any more sense than buying a forklift for a knitting company. Unless you are moving a lot of yarn, you probably aren’t going to get much out of that investment.
There is a lot of interest in HPC today, and its probably not too hard to find at least one major competitor in your own industry who is already using HPC to make their business better. There is a good chance that you can identify one or more problems in your business that 10, or 100, or 1,000 times more computing power could help you solve. But you need to be sure that that is a problem you actually need to solve, and that you need to solve it today.
What tools do you need to solve the problem?
Once you’ve found one or more areas where you think that HPC can help you move your business forward, you’ll need to identify the software that can help you get it done. Depending on what you do, this might be easy. Or it could be really, really hard.
The thing that makes a typical HPC cluster faster than a single desktop or workstation system is that it has more than one processor doing the work. All of the processors in the cluster work together to solve a small part of a much larger problem that none of them could solve all by itself. Processing a million records in a customer database and cross-referencing with transaction history and industry segment might take you two weeks on your laptop but only 2 hours on a 32-node cluster because each processor only has to do 1/32nd of the work, and they can all work at the same time (roughly).
Parallelism, embarrassing and otherwise
Getting all of these computers to work together on one big problem may not be easy, depending upon the type of work you have to get done. Let’s walk through an example in the real world.
Let’s say you are building a brick wall. The brick company delivers the bricks in a big pile, and your first job is to stack all the bricks into neat cubes of bricks on pallets so that they can be moved easily around the job site. It’s a big pile, and the more workers you have grabbing bricks out of the pile and creating neatly stacked pallets, the faster the work goes. Each worker can do his work without talking to any of the other workers, so each can proceed at his own pace, stacking bricks as fast as he can go.
This kind of problem is called embarrassingly parallel, meaning that breaking the larger problem into smaller pieces is relatively easy to do, and each of the pieces of work can be done without any knowledge of the other pieces. Examples of this kind of problem that you might find in your business could include simulations of different designs, or different searches over the same set of data.
Embarrassingly parallel problems are easy to understand, but they don’t pop up all that often. This makes sense if you think of your experiences solving problems in the real world. Back to our brick example, now that everything is all stacked up we are ready to build our wall. It’s going to take 1 mason 1 year to build the wall, and we’d like to get it done sooner. We can add additional masons to do the work faster, but only up to a point. A course of bricks cannot be laid until the course below it is laid, and you can only lay a certain number of courses before work has to stop to let the mortar dry before new bricks are laid on top. Also, the masons have to talk with one another to make sure that their work stays coordinated and the end result is a wall that is straight and true. So you might be able to get the time to build the wall down to 3 months by adding 3 more masons, but you cannot get it down to 1 day by adding 300 masons because the problem cannot be broken down into arbitrarily small units.
From bricks to software
Ok, so what does this mean for HPC in your business? If you are doing a problem that can be broken down into many completely separate problems that don’t depend upon one another, then getting value out of your HPC machine will be straightforward. You’ll just run 10 copies of your software on the 10 computers in your cluster, and your work will go 10 times faster. You probably won’t need to change any of your tools or workflow.
But if your problem is not embarrassingly parallel, then you are going to have to use software to solve your problem that knows about these constraints and dependencies and can manage getting all of the processors to work together at the same time. Depending upon what your business does, there might already be a commercial solution for this. For example, ANSYS provides numerical simulation software to help engineers design everything from bridges to propellers using parallelism in high performance computers. If there isn’t a commercial solution you are going to have to find open source software already developed on the internet that solves your problem, or partner with a software company to create or modify software for your situation. Doing this is obviously expensive, and you’ll want to make sure you understand the business case before you make such an investment.
Dividing up the original large problem into smaller chunks, getting that work out to the processors, getting the processors to work together on a solution, and then putting all that information back together into one answer fits into the discipline of parallel computing, and it is notoriously difficult to achieve in some cases.