Terrell Russell from iRODS gave this talk at SC19. “The Integrated Rule-Oriented Data System (iRODS) is open source data management software used by research organizations and government agencies worldwide. iRODS is released as a production-level distribution aimed at deployment in mission critical environments. It virtualizes data storage resources, so users can take control of their data, regardless of where and on what device the data is stored.”
Beyond Discoverability: Metadata to Drive Your Data Management
Panel Discussion: Metadata and Archiving at Scale

In this video from the HPC User Forum, Henry Newman from Seagate Government Solutions leads a panel discussion on Metadata and Archiving at Scale. “Metadata is the key to keeping track of all this unstructured scientific data. It is “data about data.” It makes scientific data easy to find, track, share, move and manage – at low cost. Unfortunately, today’s high capacity storage systems only provide bare bones system consisting of as little as file name, owner and creation/access timestamps. Data intensive scientific workflows need supplemental enhanced metadata, along with access rights and security safeguards.”
Metadata Used in Science

Metadata is the key to keeping track of all this unstructured scientific data. It is “data about data.” In the case of scientific data, it is structured data (written in a prescribed schema or order) that describes what the data is, how it was derived, and where it is located.
Video: Scalability Testing of DNE2 in Lustre 2.7
Keeping Up with the Growth of Scientific Data

“Metadata, or data about data, lets scientists find the valuable data they are looking for. Metadata especially helps find value in data that’s been created by others, no matter when or where. Without rich metadata, scientists increasingly risk spending their time just looking for data, or worse, losing it – instead of exploiting that data for analysis and discovery.”
Lustre Metadata Performance and Solutions from Seagate

“Alongside the increasingly high demands of streaming bandwidth in HPC storage solutions, there is a growing need for higher levels of metadata performance for various applications and workloads. The Lustre parallel file system provides a distributed namespace, divided across multiple metadata servers, that allows the metadata throughput to scale with increasing servers. This presentation addresses meeting the increasing requirements for high performance metadata in Lustre environments with the ultimate aim of reducing the time to results and improving overall efficiency.”
Tackling the Big Data Deluge with Metadata
Metadata makes finding scientific data much easier, both to an individual user as well as to an entire organization. A file system may only keep track of the file location, owner and various time stamps. With a sophisticated metadata system, significant more information can be stored along with the actual data itself. This allows for more efficient workflow within an organization, enabling better collaboration.
How Nirvana Software Provides Sophisticated Metadata Management

“By presenting a single global namespace across any storage device, anywhere in the world, Nirvana allows data to be easily and securely shared among globally distributed teams. Nirvana also automatically moves data to various workflow resources, based on policies so data is always available at the right place, at the right time, and at the right cost ― while keeping an audit trail as data is ingested, transformed, accessed, and archived through its complete lifecycle.”