This is the fourth and final entry in an insideHPC series that delves into in-memory computing and the designs, hardware and software behind it. This series is compiled in a complete Guide available here. This column focuses on five ways scale-up systems save money and improve TCO for in-memory computing.
Five Ways Scale-Up Systems Save Money
Although in-memory computing using scale-up systems usually has a higher upfront cost than scale-out systems, the Total Cost of Operation (TCO) presents a different story. There are many additional costs associated with delivering an effective HPC or analytics resource on a day-to-day basis. These costs may include direct operating expenses and/or administration/user time and frustration that may slow and even impede the discovery process.
1. Lower System Administration Costs
System administration is a key aspect of all large scale HPC and analytics systems. One of the key question to consider is how well does the administration needs scale with the computing resource. That is, as the system grows, does the administration cost grow as well?
For in-memory scale-up systems the answer is “very little.” Scale-up systems are managed as a single system where adding resources (more storage, processors, memory) contribute to the overall system in a transparent fashion. Administration of these systems is not much different from a typical multi-core web server where all resources are managed by a single instance of the operating system (usually Linux).
Scale-out systems on the other hand can easily create more administration needs as the cluster operates or expands. The main reason for this additional overhead is the management of multiple operating system instances (one for each node of the cluster). While there are tools that help manage these systems, keeping systems consistent over time can become a challenge. In addition, the interconnect is most often based on a high performance InfiniBand network. If Ethernet is plug-and-play, InfiniBand is often considered a configure-and-confirm component for many clusters. Additional administration skills are often required to support InfiniBand. Many scale-out systems also have heterogeneous node types. Sometimes this is by design and other times it occurs when clusters are expanded. The administrator must account and partition this variability using a workflow scheduler.
When considering the administration cost, scale-up systems are much less expensive to maintain and manage.
When considering the administration cost, scale-up systems are much less expensive to maintain and manage. In particular a recent IDC study found that scale-out system cost 43% more in terms of IT staff time to manage and implement all aspects of datacenter operations for the workload.
2. Difficulty and Cost of MPI Conversion
There are many applications and situations where MPI is not a feasible solution. Thus, many users need an environment where legacy code can run in a scale-up environment. These situations are as follows;
- Large legacy code that is too difficult to update (too expensive to port millions of lines of code to MPI),
- The use of proprietary code that cannot be modified by the user,
- The life cycle of the application does not merit a large MPI development effort.
For example, the cost to port and verify a working application (e.g. a large legacy Fortran program) to a scale-out platform is quite high and may actually exceed the entire cost of hardware.
3. Better Utilization of Resources
Virtually all HPC and analytics systems are designed to be used as a shared resource. A good measure of shared resource efficiency is user throughput or utilization. The ideal case is when the system is fully utilized with no idle resources. In reality, perfect utilization is not possible, but maximizing the number of jobs moving though the system guarantees the best return for a high performance system.
A scale-up system will always have a utilization advantage over scale-out systems, which often translates to better throughput per dollar.
In-memory computing with scale-up systems also offers better utilization than scale-out systems because all processor cores use the same memory. Memory and cores use can be managed from global perspective by the resource manager. Because a scale-out cluster is composed of many discrete processor/memory domains, it is much more difficult to achieve an optimal utilization (i.e. applications may need all the cores on a cluster node, but not all the memory or vice-versa, thus the excess resources are not usable).
A scale-up system will always have a utilization advantage over scale-out systems, which often translates to better throughput per dollar.
4. Software Capability with Incremental Scalability
In-memory systems have the advantage that virtually all software will run un-modified –on the first pass. Extending the memory footprint at runtime or programmatically is simple with scale-up systems. Attempting to increase the memory footprint on a scale-out system eventually hits the limits of a single system and the application has to be modified to operate across several cluster nodes. These types of modifications may involve trial-and-error adjustments at run-time.
With scale-up systems, adding more cores is an incremental process, unlike scale-out applications where parallelism must be explicitly coded into the application. (i.e. In a scale-up environment, existing sequential applications can be incrementally improved using tools like OpenMP. In a scale-out environment, a fully working MPI program must be created before the application can be scaled-out.)
As noted in Table One, existing MPI codes can run on a scale-up system and, in one sense, in-memory computing provides the best of both worlds where virtually any application (Legacy, OpenMP, MPI, etc.) can run on a scale-up machine.
In terms of commercial codes, a single scale-up machine can help optimize software-licensing costs. Application licenses are sometimes priced on a per-CPU or per-machine basis. Scale-out server environments can require additional licenses to satisfy the user needs.
From a users perspective in-memory computing increases productivity and keeps efforts focused on the application and not software modifications.
5. Resource Productivity and System Upgrades
Unplanned system downtime due to system/application/upgrade outages and periods of unavailability translates to a 53% cost savings (in terms of lost productive time for users) when scale-up systems were used.
In addition, the global nature of scale-up system suggests that systems are easily expandable. Additional storage, memory and processors can be added and used immediately by underlying operating system. There is no need to “configure” new nodes or install/test software as is necessary with scale-out clusters.
The insideHPC series on in-memory computing also covered the following additional topics:
- In Memory Computing for HPC
- Scaling Hardware for In-Memory Computing
- Scaling Software for In-Memory Computing
You can also download the complete report, “insideHPC Research Report on In-Memory Computing: Five Ways In-Memory Computing Saves Money and Improves TCO,” courtesy of SGI and Intel.