In this video from the HPC User Forum, Henry Newman from Seagate Government Solutions leads a panel discussion on Metadata and Archiving at Scale.
Metadata is the key to keeping track of all this unstructured scientific data. It is “data about data.” In the case of scientific data, it is structured data (written in a prescribed schema or order) that describes what the data is, how it was derived, and where it is located. It makes scientific data easy to find, track, share, move and manage – at low cost. Unfortunately, today’s high capacity storage systems only provide bare bones system consisting of as little as file name, owner and creation/access timestamps. Data intensive scientific workflows need supplemental enhanced metadata, along with access rights and security safeguards. Workflow constituents can then find and access valuable data by querying such extensive plans. With the increasing data deluge across all scientific domains, rich workflow specific metadata is essential to enable collaborators to find and share valuable data crucial to their endeavors.
Panelists (from left):
- Gary Grider, LANL
- Amanda Tumminello, Navy DoD Shared Resource Center
- Jack Collins, National Cancer Institute
- Kirill Malkin, HPE
- Frank Herold, Thinkparq/BeeGFS
- Terrell Russell, RENCI
- Frank Lee, IBM
Download the white paper: Metadata used in Science.