insideHPC had a chance this week to sit down with the executives of the newly minted Whamcloud, Brent Gorda [CEO] and Eric Barton [CTO]. Many of you probably know Brent from his work within the US Department of Energy supercomputing circles. He’s also very active in organizing various technical and community events for the IEEE/ACM Supercomputing conference series. Eric Barton brings 25 years of development experience in supercomputing to the Whamcloud team. He has been working on Lustre since he was brought in to stabilize its network stack when the project first received DOE funding. Most recently he was a Principle Engineer at Sun/Oracle where he served as Chief Architect of the Lustre group.
As you may know, Whamcloud’s business model is centered on the Lustre parallel file system. But what exactly does this mean? Lustre is an open source project, managed and held by the Oracle Corporation via their acquisition of Sun Microsystems. Given that Oracle’s core business isn’t dependent upon Lustre, many folks with large-scale Lustre deployments have been worried about the progression of the code base. We wanted to dig a little deeper and find out exactly what Whamcloud is up to with respect to our little friend Lustre.
During the interview, Brent Gorda summed up their intentions best: “Reduce the complexity and increase the community.” Whamcloud intends to pour their own efforts into developing, hardening and improving what has become a real asset to the high performance computing community. They plan on doing so via code contributions to the root Lustre source tree. Unlike many other open source efforts that have become commercial products, they will not fork the source tree for their own endeavors. This is extremely important in building and maintaining their idea of community: Lustre is everyone’s Lustre.
So how does affect their view of development? I asked Eric Barton what their three top goals were with respect to development. First, he said that Whamcloud is committed to working to improve the quality and stability of the code. Without a stable code base to work from, scalability is simply a pipe dream. This also implies de-prioritizing several of the features requested for the initial Lustre 2.0 release.
The second major development goal is to begin preparing for the exascale deployments. This one really threw me for a loop. However, Eric is very grounded is his thought when he explains why. Given that they want to always maintain the quality and stability of the file system, they need to begin to think intelligently about how to address systems with hundreds of thousands of nodes in the future. They want to ensure that these features make it into the code base gracefully, as opposed to dropping the features in the community’s cage all at once. Finally, he wants to make sure that the proper health and monitoring features gracefully make it into the source. Exascale means nothing if the platform can’t be kept stable long enough to run an application. A healthy system is a happy system.
So where is Oracle in all of this? Brent and Eric were very adamant that they do not intend to directly compete with Oracle. Oracle, via their inherited Sun support contracts, receives revenue based on the service and support of the Lustre file system. They both indicated that Whamcloud will carefully manage its relationship and impact on Oracle. Whamcloud’s focus is Lustre on Linux for HPC — particularly the high end — whereas Oracle is more focused on commercial deployments. Whamcloud would rather be good stewards of the community and garner revenue through non-recurring engineering.
All in all, Whamcloud seems to be off to a raging start. They’re growing on a daily basis [up to 10 employees at the time of the interview] and they’ve already had significant interest from partners and potential customers. What was recently a damsel in distress with Lustre, now has its knight in shining armor with Whamcloud.