The performance-savvy HPC developer is in high demand today. Leaps in intra-node parallelism, memory performance and capacity are set to meet applications struggling to exploit existing systems head-on.
I believe we’re entering a transformative period that calls for a step change in how we act on performance. HPC centers and their users frequently rely on one or two performance experts (or their system vendor’s experts!) to improve codes – and that’s not scalable: there are too many applications for the number of performance experts out there.
The more time I spend with developers and users, the more I realize we can focus improvement of application performance nearer to them:
- Every developer can optimize – No-one knows a code better than its author(s). Performance needs to be placed in their hands – before, if necessary, engaging the expert.
Provide performance tools that target the domain expert and developer – and that cover the whole picture from processor to OpenMP to multi-node MPI and I/O.
If a code isn’t profiled, it’s not efficient. If a system doesn’t have a profiler, its developer users cannot be efficient.
Profiling has real impact: One developer from industry described his first morning’s results of profiling using Allinea Forge in three words: “Bottleneck, bottleneck, bottleneck”.
Parts of that code were set for a rewrite – until Forge found they were not the problem. Weeks of time was saved by targeting the right parts of the code – and in less than a full day the scalability limit had been lifted by 10x.
- Always Be Benchmarking – a recent vendor presentation spoke of a 2x speed up of a code in one afternoon at a code dungeon. Great! Do we know how many other codes are still running at half speed?
Historically, performance has been hard to measure – so many users don’t do it. The result: less output from the cluster.
Accessible benchmarking for real code and workloads is the most effective way of ensuring a cluster is efficient.
Some sites are get more science done by using Allinea Performance Reports for performance advice and measurement to their users transparently. No relinking, no instrumentation, no changes – just the answer they need.
- Train up the next generation – performance optimization needs to be instinct to every new HPC developer. Educators must prepare the computational scientists for the future. Let’s teach MPI and teach how to profile and debug – and professional development practices too. Through training at education events like XSEDE and RMACC or the student cluster competition, we’re trying to spread that word.
It’s not about a specific tool, but about a methodology and best practice.
We hope we’ve made it easier. Allinea Forge completes the workflow of HPC developers by integrating debugging, profiling, editing, building and using version control systems in one tool that even works transparently on remote machines. As surely as a developer must debug their code, we’ve made it so that they can switch from debugging to profiling without hoops to leap through.
Modernization is about more than the code – and I think we’re in exciting, transformative times.
This article was submitted by David Lecomber, CEO, Allinea.