How to Control Your Supercomputing Programs

Print Friendly, PDF & Email

This is the second article in a series takes from insideHPC Guide to Production Supercomputing and Systems Management. This 5 part article series will explain how a properly managed HPC systems will lower the total cost of ownership of your supercomputing programs. This article looks at the SGI Management suite which is designed to help you control your supercomputing programs.

The Ponemon study reveals that even more significant costs are incurred by organizations with revenue models that depend on the data center’s ability to deliver IT and networking services to customers. The highest cost of a single event in the study was more than $1.7 million.”

Get the InsideHPC Guide to Production Supercomputing - Download it today

Get the InsideHPC Guide to Production Supercomputing – Download Now

SGI has created a comprehensive suite of software tools that provide operating system provisioning, system health management and power management of SGI computer systems. The SGI Management Suite supports all SGI systems: SGI® ICE™, SGI® Rackable® and SGI® UV™ powered by Intel® Xeon® processors. It provides tools to monitor essential system metrics, initiate management actions and improve the overall power efficiency.

Pressures by management for cost containment are answered by improving software maintenance procedures and automating many of the repetitive activities that have been handled manually. This lowers Total Cost of Ownership (TCO), boosting IT productivity, and increasing return on investment (ROI).

The SGI® Management Suite includes powerful software tools to facilitate:

  • High speed provisioning
  • Version control image management
  • System health monitoring and management including memory, the CPU, and power usage on the motherboards.

High Speed Provisioning

SGI Management Suite combines the discovery of HPC System nodes and multicast provisioning to significantly shorten the bare-metal provisioning time of large scale HPC Systems.

Servers are provisioned in parallel using multicast technology allowing downtime during maintenance periods to be significantly decreased. Other benefits include rapid system installation and updating in minutes rather than hours or days; single provisioning sessions at large scale; and the archiving of multiple Linux operating systems that can be quickly provisioned on demand.

Version-Controlled Image Management

This capability, built into the SGI Management Suite, tracks the changes to the Linux operating system over time. These images in RPM format are easily deployed to servers in the system. In the case of problems after an upgrade, the system can easily be returned to a known working state.

Version controlled image management allows systems administrators to run various Linux operating systems to support a wide variety of user application requirements. It also reduces the risk of upgrading to a new Linux operating system by reverting to the previously working software if problems occur.

In the next article we’ll look at Systems for Managing and Monitoring your Supercomputer. If you prefer you can download the complete insideHPC Guide to Production Supercomputing and Systems Management, courtesy of SGI and Intel – Click Here.