Brent Gorda
insideHPC had a chance this week to sit down with the executives of the newly minted Whamcloud, Brent Gorda [CEO] and Eric Barton [CTO]. Many of you probably know Brent from his work within the US Department of Energy supercomputing circles. He’s also very active in organizing various technical and community events for the IEEE/ACM Supercomputing conference series. Eric Barton brings 25 years of development experience in supercomputing to the Whamcloud team. He has been working on Lustre since he was brought in to stabilize its network stack when the project first received DOE funding. Most recently he was a Principle Engineer at Sun/Oracle where he served as Chief Architect of the Lustre group.
As you may know, Whamcloud’s business model is centered on the Lustre parallel file system. But what exactly does this mean? Lustre is an open source project, managed and held by the Oracle Corporation via their acquisition of Sun Microsystems. Given that Oracle’s core business isn’t dependent upon Lustre, many folks with large-scale Lustre deployments have been worried about the progression of the code base. We wanted to dig a little deeper and find out exactly what Whamcloud is up to with respect to our little friend Lustre.
During the interview, Brent Gorda summed up their intentions best: “Reduce the complexity and increase the community.” Whamcloud intends to pour their own efforts into developing, hardening and improving what has become a real asset to the high performance computing community. They plan on doing so via code contributions to the root Lustre source tree. Unlike many other open source efforts that have become commercial products, they will not fork the source tree for their own endeavors. This is extremely important in building and maintaining their idea of community: Lustre is everyone’s Lustre.
Eric Barton
So how does affect their view of development? I asked Eric Barton what their three top goals were with respect to development. First, he said that Whamcloud is committed to working to improve the quality and stability of the code. Without a stable code base to work from, scalability is simply a pipe dream. This also implies de-prioritizing several of the features requested for the initial Lustre 2.0 release.
The second major development goal is to begin preparing for the exascale deployments. This one really threw me for a loop. However, Eric is very grounded is his thought when he explains why. Given that they want to always maintain the quality and stability of the file system, they need to begin to think intelligently about how to address systems with hundreds of thousands of nodes in the future. They want to ensure that these features make it into the code base gracefully, as opposed to dropping the features in the community’s cage all at once. Finally, he wants to make sure that the proper health and monitoring features gracefully make it into the source. Exascale means nothing if the platform can’t be kept stable long enough to run an application. A healthy system is a happy system.
So where is Oracle in all of this? Brent and Eric were very adamant that they do not intend to directly compete with Oracle. Oracle, via their inherited Sun support contracts, receives revenue based on the service and support of the Lustre file system. They both indicated that Whamcloud will carefully manage its relationship and impact on Oracle. Whamcloud’s focus is Lustre on Linux for HPC — particularly the high end — whereas Oracle is more focused on commercial deployments. Whamcloud would rather be good stewards of the community and garner revenue through non-recurring engineering.
All in all, Whamcloud seems to be off to a raging start. They’re growing on a daily basis [up to 10 employees at the time of the interview] and they’ve already had significant interest from partners and potential customers. What was recently a damsel in distress with Lustre, now has its knight in shining armor with Whamcloud.







The team performed a series of expensive high-fidelity simulations on the Ranger supercomputer to generate a small “reduced model” which was transferred to a Google Android smart phone. They were then able to solve problems on the phone and visualize the results on the fly.
The second announcement involves managed HPC services. Not only can X-ISS install your machine now, they can also manage it remotely. ManagedHPC from X-ISS is the outsourced HPC system management service that allows customers without systems administration expertise in house to purchase HPC resources.
Being able to base our cloud storage and compute products on Verari’s world class BladeRack® 2 Series technology and FOREST containerized data center infrastructure puts us at the front of the pack to serve the demanding cloud customer,” said
Technology innovation is only half the story at Cirrascale; we must also innovate with our business model,” said
OSC has partnered with Moldex3D to demonstrate the performance of its pioneering 3-D simulations for efficient verifications of part/mold designs for educational use. As part of this partnership, Moldex3D is donating 30 eDesign licenses over a three-year period with a cost value of $1,050,000 in support of OSC’s Ralph Regula School of Computational Science education program.
The system is running the RealityServer 3D web application service platform, developed by mental images, a wholly owned subsidiary of NVIDIA. The RealityServer platform is a powerful combination of NVIDIA Tesla GPUs and 3D web services software that delivers interactive, photorealistic applications over the web using the iray renderer, enabling animators, product designers, architects and consumers to easily visualize 3D scenes with remarkable realism.
NCSA has posted another of their recorded presentations by interesting visitors. This time Cynthia McIntyre, senior vice president of the
“This is a very customer-driven release,” said John Powers, President of Digipede. “We’ve spent a lot of time listening to our most demanding customers, the folks who really push the envelope on grid computing projects. We’ve been pouring over support cases, replicating customer configurations, and really focusing on features and performance improvements that help out with the most extreme cases. As a result, Version 2.4 now handles many of the most difficult grid scenarios more smoothly. For example, customers can handle a huge number of very short tasks more smoothly, and can get greater throughput from I/O-intensive distributed applications. This greatly expands the class of applications that are good candidates for grid computing.”
We claim we can manage a full blade ecosystem without requiring any network skills, because network virtualization is done in the silicon and through Oracle middleware technology,” Dimitris Dovas, director of product management for Sun hardware at Oracle, said on a videoconference announcing the new hardware.
Before you mark this as “just another Infiniband press release,” you might want to reconsider. I had the pleasure of speaking with Phil Murphy this week, VP of QLogic’s Network Solutions Group. The Network Solutions Group heads up the goodness that is QLogic’s TrueScale Infiniband product suite. Those who have been around the Infiniband block before remember that this group was formerly their own company called PathScale. QLogic acquired the startup and pumped them full of funding and corporate clout with the fabs. After several years of work, what they have is a high bandwidth, low latency interconnect that looks like Infiniband, smells like Infiniband but runs like a scalded cat.


