In this video from LUG 2015 in Denver, Kyle Lamb, Michael Masson, and Susan Coulter from LANL present: There and Back Again – The Battle of Lustre at LANL.
A little over a year ago LANL’s HPC Division purchased and fielded our first general purpose InfiniBand-based Lustre parallel file system. This new Lustre deployment, being the first of several similar planned deployments, gave us the opportunity to design a new storage backbone from the ground up and to gain in depth experience with and insight into Lustre technology in order to facilitate the installment and configuration of future systems. These systems needed to be mounted on a variety of clusters via LNET over both Ethernet and IB connected infrastructures. One primary design question was whether or not to use Fine Grained Routing (FGR), which, because of the conceptual similarities between our existing Ethernet Parallel Scalable Backbone (PaScalBB) and Lustre FGR, promised some clear and easily identified advantages. However, the complexity of the underlying technology and implementation details were less well known. This presentation discusses the decision points that led to an FGR implementation; lays out the analysis that supported that deployment, and examines the resulting pros and cons of that decision. Deployment of the second system uncovered additional implementation challenges. This case study includes a discussion of how these additional issues were addressed as well as how these discoveries have informed our plans and preparations for the third system slated for deployment late in 2015.”