In this video from SC14, Patrick Wohlschlegel from Allinea Software demonstrates how the company’s Performance Reports took has been integrated into HP’s cluster management system for system administrators.
insideHPC: So, what are we looking at here?
Patrick Wohlschlegel: So, here we are looking at the work that Allinea has done with HP. So, basically we have an ongoing collaboration with HP, in which we integrate our tool Allinea performance report inside HP, plus a management system – which is HPCMU – which is designed for the system administrators. So, the approach we have is that with CMU we provide – HP provides – the system administrators with a cluster understanding what’s going on, what nodes are used, is the cluster running efficiently, is there something broken? And we figure that if we could bring to the HP users a complimentary view with Allinea Performance Report in which we can see the performance of the application, the efficiency of the application, not coming from the system perspective, but from the application perspective. We could do something really, really cool that will help all the HP customers.
So, basically what we have done, and we can say this on the small stream, is that inside the two HPCMU, we have created a connector in which we access a Allinea Performance Report very easily. So, here we have a list of jobs that have been executed on the system. So, if we have a look at those jobs, we can see all the metrics present from the HPCMU system, and basically, for this run in particular, we had executed the performance report and by clicking in the menu inside the HPCMU, we can come– we can show very easily the performance of the application and how it has been running on the cluster. So, our tools include multiple metrics about the CPU, the communications, the IUs, see if things are efficient or if things or broken or running well. And here, for instance, we figured that in this example we have been running an application with open MP, and we see inside this report that we have a lot and lots of activity which is not actually calculation, but it switches between different context because of the open MP.
So, Allinea Performance provides this information with those raw metrics, but more importantly, it provides hints and specific tests telling people what the problems are and here we can read the involuntary context which rate is high multiple shreds may be showing one core. Well, that rings a bell, so how could we change this? So, we could simply change the way the application has been executed and we’re doing this in a different run here where we have just change the MPI command. So, we gathered information on the performance report again and here we can figure that a run that was taken seven minutes just by changing a little in the command line has been reduce down to two minutes. So, this has a lot of implication for the system administrator and people who own the clusters, because we ensure that they’re running more applications, more jobs within one single time frame to increase the system usage, the productivity and the return on investment when they are purchasing an HP-cluster.
insideHPC: Very nice.