A day doesn’t go by without some reference to “Big Data”. As end user organizations wrestle with the 4 V’s (volume, variety, velocity and veracity), little attention has been given to data that was created on its own and at great cost. Some of this data may be easily readable (i.e. spreadsheets, video), but massive amounts of data are not in an industry standard format, or came from experiments or large scale simulations that are stored in specific formats. In addition, long term retention of the data is compromised without ownership or active involvement in an on-going investigation or project. Perhaps the most critical issue is: “What is in this data file, and does it mean something to me (or should it) ”?
Sensors data from a wide range of sequencers, cameras, scanners, etc. are creating exabytes of data. Today’s high performance and high capacity file systems are able to accommodate the massive amounts of data, however, identifying what the data is, how it can be used or who needs to see it, is not part of the current file systems design or capability.
Metadata is the key to keeping track of the massive amounts of variety of some of this data. Metadata makes finding scientific data much easier, both to an individual user as well as to an entire organization. A file system may only keep track of the file location, owner and various time stamps. With a sophisticated metadata system, significant more information can be stored along with the actual data itself. This allows for more efficient workflow within an organization, enabling better collaboration.
General Atomics has created the Nirvana Metadata Centric Intelligent Storage System. Nirvana is a software product which works alongside existing high performance/capacity storage systems. Nirvana allows researched to quickly locate data, based on their needs or workflows. In addition, Nirvana can inventory existing data junkyards, those files where old (possibly useful) data is deposited from long ago experiments or simulations. The metadata and files are tracked and kept in a relational database, giving users the ability to quickly search tremendous amounts of files and data for relevance. By also eliminating duplicate files, file system performance can be increased and reduced the need to purchase more storage assets. The white paper, Tackling the Big Data Deluge in Science with Metadata describes in detail their Nirvana product, and why organizations would care to implement such a system. The paper describes both the high level end user needs, as well as details on implementation, software workflows, and the components. You will be able to quickly see how this software will enhance your organizations use and re-use of data.
By implementing the General Atomics Nirvana system, developed in conjunction with the San Diego Supercomputing Center, organizations can discover hidden data, optimize workflows, and bolster data discovery. Nirvana is an easily deployed solution developed based on the needs of leading research organizations. Reading this whitepaper will give you the background as to why this is so important for leading edge organizations as well as the lower level details on implementation and component architecture. Download it now !!!