insideHPC: It’s great to talk to you once again. I was hoping we could get a preview of what you’re going to be presenting tomorrow at LAD ’14.
Eric Barton: We’ve been working on the FastForward Project for the last two years. One of the things we developed in it was scalable collectives for use inside server clusters. That really has the chance to change the game on Lustre RAS, on detecting the health of the whole server cluster, and to allow the server cluster to act more like a collection of cooperating servers, rather than just individual servers.
One of the real problems you have, and the thing we had to address for extreme scale, is that with the number of clients that you can expect at that scale, the service time can vary from a single client, will be served in microseconds, but when you load up with hundreds of thousands of clients, now you’re talking about hundreds of thousands of times of that service latency.
Unfortunately, Lustre today uses a service latency as it’s way of diagnosing client death. It’s very hard to distinguish, “Is my client actually alive or is it just queuing up behind all the others?” Similarly, the clients looking at the server and saying, “Is the server alive or am I just queued up behind all these others clients.”
We need a better way of working out whether the cluster is being responsive to me and also for the cluster to work out whether the clients are being are responsive. We’ve already have some talks today at the LAD’14 about exactly this issue – that it’s crucial to be able to diagnose peer death accurately and promptly. If you don’t do it promptly you’re stuck with these multi-minute timeouts and if you don’t do it accurately then you’re aborting applications.
We developed this thing called Gossip Protocol in the FastForward Project, and what it does– it’s actually a nondeterministic broadcast, what it means is all of the servers can keep in touch with each other at very, very low cost. They ping each other like once every second and then within 10 pings or so, a thousand servers can know the status of every other server. That means you’ve got very, very low latency notification, and on top of that, we can use that – now that we know the health of the servers – we can use that to do collectors across servers and share information very, very scalably. You can use that to leverage much more robust Lustre communications.
insideHPC: The output of FastForward – you built a prototype, was this part of the suite that you did?
Eric Barton: Yeah. We demonstrated it in FastForward, and what we want to do now is look at landing that code in the main Lustre tree and then developing these improved RAS features on the back of it.
In this video from the LAD’14 Conference in Reims, Eric Barton from Intel presents: Leveraging Fast Forward Collectives To Improve Lustre RAS.
Download the Slides (PDF) * Download the Fast Forward Storage and IO Program Final Report (PDF) * See more talks in the LAD’14 Video Gallery
Get the InsideHPC Guide to Lustre from the editors of insideHPC courtesy of Intel.