Today Whamcloud announced that the company will be stepping up to help produce the next release of the Lustre file system, version 2.1. In light of the news first reported here that Oracle has ceased development of Lustre, getting the next release out has been a big concern for the Lustre community.
I caught up with Whamcloud CEO Brent Gorda to learn more.
insideHPC: What did you announce today?
Brent Gorda: Whamcloud recently talked about a few high profile hires that we feel round out our staff and put us in a leadership position. We are now stepping up to help the community move forward and collaboratively produce a “community release” of Lustre 2.1. Note this is a community effort the community is helping with the next release, which involves taking the code from the canonical tree, testing, applying fixes and providing the result to the community in the best shape possible. Of course, any code modifications will be provided back upstream to the canonical tree in proper form for inclusion there.
insideHPC: Everyone seems to be afraid of a fork in the Lustre code, is that where this is headed?
Brent Gorda: Fears of a fork in the Lustre code are unfounded. These fears appear to stem from observing that there are now a number of user groups (at least three) and new corporate entities (at least two including us). If you listen to these groups, none have suggested they want to fork the code. I believe we have heard just the opposite sentiment expressed by each group. In reality the people who work on the code, the developers, have not changed. They are the same people still working together, productively, on the same open source project. These individuals may be employed by different companies in some cases, but they continue to do technical work in a collaborative and cooperative way. They are not interested in forking the code either.
insideHPC: What about Oracle? They own the IP but they reportedly have ceased development of Lustre. Does a release need their blessing?
Brent Gorda: Oracle does not need to bless the community to move forward in this manner and neither should they be bothered by it. Oracle has rightfully acquired the IP and so of course they own it. But it is open source and the community can just move it forward. The lack of communication from Oracle has everyone concerned about the technology, the company, the people. There is no evidence that Oracle wishes to preclude the community from access to or use of Lustre. In fact it would seem the opposite is true. For this entire time, Oracle has continued their community service by providing gate keeping, testing and release activities.
insideHPC: The people I talk to in the Lustre community seem to agree one thing: that a new release needs to come out. How has the community responded to this announcement so far?
Brent Gorda: The community has indeed been asking for a new release and looking for someone to lead that effort. We have received a number of offers of help from individuals and companies who are well qualified to make that offer. I expect the number of volunteers will grow as word gets out that the activity is starting up. It is exactly what one would expect a community to do and really underscores that the threat of a fork is overblown.
insideHPC: Going forward, do you think Lustre code will be developed in a similar fashion to Mozilla or Apache? What is the right model?
Brent Gorda: This is a hard question to answer because there are many possible correct responses. One of the common signs of success in other projects is that deep in the core of the activity there are a small number of gatekeepers. These are highly skilled technologists who work closely together, know one another well and share a tight bond based on respect and trust of one another. Note that in most cases these people actually work for different companies even competitors. It does not matter who writes the paycheck for these people. What matters is they care deeply about doing the right thing with the product. Whatever model we settle on will preserve these core strengths. We’ve worked up a white paper on this very topic covering details around community development strategies and challenges, maintaining quality, and other relevant topics.
insideHPC: Testing and certification are critical to a real-time file system like Lustre. I’m told deployment on big systems is the best way to shake out the bugs. How will Whamcloud help that process?
Brent Gorda: It is true that large-scale deployments are key to finding and fixing issues. In fact Sun (and now Oracle) have spent a huge amount of time and effort over the past few years mostly doing testing and hardening of Lustre 1.8 and 2.0. The Hyperion machine at Lawrence Livermore National Laboratory (LLNL) has been an incredible resource for this as have large systems at Oak Ridge National Laboratory (ORNL) and other sites. Whamcloud has been socializing the idea of a community test infrastructure (software) – an automation capability which would make it easier to exploit test resources, ensures consistent testing and which produces a test results database to help qualify code contributions and monitor convergence on stability. The thinking is that if we make it simple to test Lustre, it will be easier for the community to contribute resources and more will be offered.
insideHPC: You have some notable contracts now with a couple of the National Labs. Will the releases you put out be driven by their particular needs?
Brent Gorda: The specific release we are talking about here is a community release not driven by a single user group. In general, however, the national labs’ interests in making petascale systems more pedestrian are key to enabling the community uptake in these systems. As the labs move on to multi-petascale and even exascale, they are funding the research and development that will trickle down to the wider community systems of the future. Lustre was envisioned by a few very smart people - namely Peter Braam (CFS) and Mark Seager (LLNL). It was funded to a large degree by the labs’ “Path Forward” program. The fact that the national labs still care enough about the technology to continue to put millions into it is a testament to the technology, the visionaries who started the project, and the people who continue to be behind it today.
insideHPC: The Europeans have formed their own consortium around Lustre. Do they have a seat at the table here as well?
Brent Gorda: Absolutely, yes. You may recall that up until very recently, the Europeans – specifically the French Alternative Energies and Atomic Energy Commission (CEA) – had the most capable Lustre installation anywhere. Whamcloud has participated in the European meetings and has joined the European Open Filesystems Cooperative Society (OFS SCE) as a founding member. The European group is serious and has a large number of well-qualified members who will add greatly to the project.