In this special guest feature, Ken Strandberg offers this live report from Day 2 of the Lustre User Group meeting in Portland.
John White of Lawrence Berkeley National Laboratory presented his work on ‘condoizing’ Lustre. Condoizing is a win-win for the researcher and the rest of the institution, but how do you condoize Lustre in such an environment? John’s work reveals some of the issues and possible solutions his team ran into.
Scott Yockel from Harvard University shared how they are deploying Lustre across their massive three data centers up to 90 miles apart with 25 PB of storage, about half of which is Lustre. They’re using Docker containers and employing a backup strategy across the miles of every NFS system, parsing of the entire MDT, and includes 10k directories of small files.
Michael Kluge from Technische Universität Dresden and Johan Peyrard from Atos, presented a build of a NetApp-based storage system that delivered 1.3 million IOPs from memory, using 609 nodes. Their design was able to achieve over 100,000 write IOPs per node, with peak performance achieved using just a few nodes, and stable performance retained as the system scaled.
SSDs were central to James Coomer’s (from DDN) talk about Accelerating Lustre with SSDs and NVMe. While SSDs are still pricey for large file storage systems, Coomer showed how data centers can optimize data for SSDs and optimize SSDs for data, with a modest investment in SSDs, intelligent policy-driven data management, and block-level and Lustre object level data placement schemes.
J. Ray Scott showed the experience at Pittsburgh Supercomputing Center with Lustre running on the Intel® Omni-Path Architecture fabric on their latest system, Bridges. Untuned read performance with the fabric can theoretically peak at about 26 GB/second, with averages of about 16 GB/second achieved, using 8 nodes and two OSSs.
Dave McMillen and Steve Woods from Cray illustrated how InfiniBand can be extended to many kilometers. Up to 10 km is a straightforward deployment, but longer distances require considerations for filling pipelines. Nonetheless, their solutions are in production today.
Olaf Weber from SGI described a collaboration between SGI and Intel to develop a multi-rail LNET for very large Lustre nodes, such as a 256-socket NUMA system. Without a multi-rail implementation, multiple LNet networks will be needed to support the bandwidth of large nodes. But, a multi-rail LNet will allow the network configuration to match the fabric. Project status is available at wiki.lustre.org/Multi-Rail_LNet.
Two projects are tackling scalability. Nicholas Chairnov from University of Oregon is working on Scaling Apache Spark on Lustre. His team was able to see considerable scalability from plain Spark on Lustre to about 10,000 cores using their methods. Artem Blagodarenko from Seagate is looking at scaling Lustre’s long-standing LDISKFS beyond the 256 TB limit.
Several presentations at the conference dealt with backing up the file system. Shinji Sumimoto and his Fujitsu team chose to focus on a directory-level and user-level backup method. They enable users to create their own directory backups, manage their snapshots, and merge them.
Tomorrow is the final day of the conference.