All about Baselining: RedLine Explains HPC Performance Methodology

In this special guest feature, Dan Olds from OrionX catches up with RedLine Performance Solutions to talk workload optimization.

In HPC we talk a lot about performance, and vendors are constantly striving to increase the performance of their components, but who out there is making sure that customers get the performance that they’re paying for? Well, according to their recently published ebook, a company called RedLine Performance Solutions has adopted that role with gusto.

I’m so impressed by RedLine, that we selected (begged and pleaded too) to have them do the performance tuning on our Cyclops machine.

RedLine is a 22 year old company that started with a passion for supercomputing and evolved with the idea that many customers need help to manage and optimize their HPC systems and applications in order to get the most out of them. What really caught my eye is their methodical approach to systems and performance management and monitoring.

They lay out five disciplines that customers should follow in order to maximize their system performance and reliability.

At the foundation of their methodology is baselining. This means running a set of applications to rigorously test compute, I/O, storage, and application performance. You want to run these at the node level and also at the cluster level, just to make sure that everything is performing as it should.

Once you have a record of your results, you run the same tests before and after each change to the system, whether it’s a new CPU, an application patch, or a new version of the o/s. This will help identify any performance problems arising from the change well before users get a chance to complain. RedLine also advises clients to run baseline tests when trying to figure out the root cause of a performance problem.

The company also helps customers select light-weight systems monitoring tools that are several steps beyond the usual ‘idiot light’ systems monitoring. The tools they recommend will also help flag failing components and isolate potential problems before they become issues for end users.

RedLine applies the same philosophy to performance monitoring tools:

Performance monitoring provides insights into system health that basic system monitoring just can’t reach.” According to RedLine, the key to performance monitoring having a deep knowledge of how system components and software interact and where the bottlenecks occur. Baselining, and tracking performance data over time is a big part of ensuring maximum performance.

Change management also comes in for a lot of attention from the company. According to them, it’s inevitable that changes will at one time or another break customer systems. Good change management will reduce the chances that changes will take down your systems and also help bring them back up much faster.

However, a big part of change management is ensuring is getting the people involved to adhere to customer change policies. RedLine draws a line in the sand by saying “there should be zero tolerance for unauthorized change” and recommending tools that will root out unauthorized changes and even reverse them.

Performance tuning is also an art form that RedLine has mastered. They point out that tuning must be approached with great care, since changes in individual systems will always impact both upstream and downstream systems. It’s also imperative that tuning done beyond the initial pre-production phase of the application lifecycle be accompanied by rigorous systems/performance management as described above.

RedLine’s Performance Methodology ebook is a highly readable and interesting document. It should open some eyes in the industry. You can read it here.

Sign up for our insideHPC Newsletter