An Update on MarFS in Production

Print Friendly, PDF & Email

In this video from the MSST 2017 Mass Storage Conference, David Bonnie from LANL presents: An Update on MarFS in ProductionMarFS is a Near-POSIX File System using cloud storage for data and many POSIX file systems for metadata. Extreme HPC environments require that MarFS scale a POSIX namespace metadata to trillions of files and billions of files in a single directory while storing the data in efficient massively parallel ways in industry standard erasure protected cloud style object stores.

“With MarFS in production at LANL since fall 2016, we have gained new insights, learned lessons, and expanded our future plans. We’ll discuss the various hurdles required to deploy such an ambitious system with minimal manpower. Further, we’ll delve into the challenges, triumphs, and defeats on the road to a new tier of inexpensive scalable storage.”

Many computing sites need long-term retention of mostly cold data often “data lakes”. The main function of this storage tier is capacity but non trivial bandwidth/access requirements exist. For many years, tape was the best economic solution. Data sets have grown larger more quickly than tape bandwidth improvements and access demands have increased in the HPC environment. Disk can be more economically for this storage tier. The Cloud Community has moved towards erasure based object stores to gain scalability and durability using commodity hardware. The Object Interface works for new applications but legacy applications utilize POSIX for their interface.

David Bonnie is a scientist with a background in scalable storage systems and software development. He contributed to the development of OrangeFS/PVFS2 and holds a deep interest in understanding current and future storage hardware and software. He is currently the campaign storage and enterprise backup technical lead for the HPC Division at Los Alamos National Laboratory, both positions that include leading integration, development, and production support efforts for LANL’s pre-archive and backup tiers. He is integral in the development and deployment of MarFS, LANL’s new filesystem designed to serve the needs of ever-growing datasets with spiraling bandwidth and reliability challenges. David holds a B.S. and M.S. in Computer Engineering, both from Clemson University.

See more talks in the MSST 2017 Video Gallery

Check our our insideHPC Events Calendar