Can we use Machine Learning to Learn about Machines?

Print Friendly, PDF & Email

In this special guest feature, Ellexus CEO Rosemary Francis writes that the convergence of HPC and Ai is already changing the landscape in the world of supercomputing.

Rosemary Francis, CEO of Ellexus

AI and machine learning are hot words at the moment in many sectors. We’re all very aware just how much a company can predict about us by plugging information into a cunning algorithm and there are thousands of start-ups racing to use machine learning to create the next big-win app.

As the dust settles on the initial hype, we are starting to see some of the areas where this technology has real benefits. One of those areas has to be high performance computing (HPC) and the quest for exascale.

It is only as machines get faster that they are able to get cleverer. This means that more and more industries are becoming interested in making supercomputers even more powerful and investment in the exascale goal will increase.

In return, there are a number of areas of HPC that could benefit from machine learning – processes that can be improved and automation made more fluid. Here are the top three areas that spring to mind.

Scheduling

In particular, I/O aware scheduling. Some vendors are already looking into using machine learning to make better decisions about how to schedule applications. To derive the greatest benefit of doing this, there has to be more data made available to the scheduler.

In the past, I/O aware scheduling has fallen over because token based systems and quotas are too simple and inflexible. In contrast, machine learning algorithms can cope with the complexity of real I/O workloads changing over the lifetime of the application. The usually ill-defined goals of “full cluster utilisation” and “better performance” are not a problem to a system that can cope with more than one number to measure for success.

In a world which is increasingly I/O bound, I/O aware scheduling is no longer something we can do without. It will only become more important as we head towards exascale.

Procurement

The white elephant in the room of any meeting to design the next supercomputer is the question: “How can I specify what I want tomorrow when I don’t know what I need today?”

No algorithm is ever going to predict the future, but getting a handle on areas of variation and variability is a good start. This is where machine learning comes into play, as it can certainly highlight what is correlated and what isn’t. This is a pretty good step in understanding what will change, how it will change and what we still don’t have a clue about.

Tuning

There are some cool start-ups around the areas of AI tuning and optimization, Concertio being just one. Making everything run faster by measuring what has been done in the past and tuning for that has been done before, but this continues to become more sophisticated.

The interesting part is when you use machine learning to tune the hardware and optimise for another machine learning workload. It’s not quite machines designing machines, it’s more like the production assembly in a car factory, but we can all see how the possibilities open up for hardware and software as speed picks up.

Sign up for our insideHPC Newsletter