This is the first entry in an insideHPC series that delves into in-memory computing and the designs, hardware and software behind it. This series, compiled in a complete Guide available here, also covers five ways in-memory computing can save money and improve TCO.
To achieve high performance, modern computer systems rely on two basic methodologies to scale resources. Each design attempts to bring more processors (cores) and memory to the user. A scale-up design that allows multiple cores to share a large global pool of memory is the most flexible and allows large data sets to take advantage of full in-memory computing. A scale-out design distributes data sets across the memory on separate host systems in a computing cluster. Although the scale-out cluster often has a lower hardware acquisition cost, the scale-up in-memory system provides a much better total cost of ownership (TCO) based on the following advantages:
- Lower system administration costs
- Difficulty and cost of cluster software conversion (MPI)
- Better utilization of resources
- More software capability with incremental scalability
- Better resource productivity and easier system upgrades
Although the scale-out cluster often has a lower hardware acquisition cost, the scale-up in-memory system provides a much better total cost of ownership.
Two case studies illustrate these cost advantages when using in-memory scale-up UV™ computer systems from SGI. The Genome Analysis Centre based in Norwich, UK, was able to provide a highly flexible application execution environment, familiar programming tools, and standard Linux single system administration that far exceeded previous cluster based systems. Qorvo, a leading provider of Radio Frequency (RF) solutions for mobile, infrastructure and aerospace/defense, was able to reduced operator errors by lowering the complexity and at the same time achive a better utilization efficiency by using an SGI UV™ in-memory system.
In-Memory Computing for HPC: The Way It Is Supposed to Work
The fundamental method behind all forms of computing consists of a processor that executes instructions and memory that stores data. General use processors have become faster and denser, but the basics of instruction execution remain the same. In terms of memory, the speed and densities have also increased, but have lagged that of processors. The same can be said of non-volatile storage (disk drives) that offers large amounts of permanent storage, but at much lower speeds than memory and processors.
Computer programming languages are used to supply instructions to the processor. From a historical perspective, this model has allowed computers to be used for all sorts of problems using many types of computer languages. Early languages like Fortran, Lisp, or Cobol gave the programmer portability between hardware by moving the abstraction level away from the computer and closer to application at hand.
Ideally, when solving a problem with a computer the best performance is obtained by loading all data and program (instruction for the processors) into memory and “running the program.” There are other details that may affect performance, such as processor cache, but the general goal is to get everything into memory and execute the program.
There are two reasons this method must be augmented with other approaches and technology;
- The data set does not fit into the available memory,
- A single processor is not fast enough to solve the problem in a reasonable amount of time.
Indeed, in some areas such as high performance computing (HPC), database, and analytics, the problems are so big that both memory size and processors speed are pushed to the limits.
It is important to remember that the amount and efficiency of the concurrent portions of a program determine how much faster it can run on multiple processors. Not all applications are good candidates for parallel execution.
There are two major approaches used to expand the single processor/memory model. Both methods increase the number of processors that run the program instructions. That is, programming instructions are spread out over multiple or many processors. This solution requires that the program instructions be broken into independent subsets that can run on different processors at the same time. If the application program has concurrent sections then it can be executed in a “parallel” fashion. Much like using multiple bricklayers to build a brick wall. It is important to remember that the amount and efficiency of the concurrent portions of a program determine how much faster it can run on multiple processors. Not all applications are good candidates for parallel execution.
Over the next few weeks this series on in-memory computing will cover the following additional topics:
- Scaling Hardware
- Scaling Software
- Five Ways In-Memory Computing Saves Money and Improves TCO
You can also download the complete report, “insideHPC Research Report on In-Memory Computing: Five Ways In-Memory Computing Saves Money and Improves TCO,” courtesy of SGI and Intel.