Improving access to archived data sets

May 2, 2007 by

John’s comment about “write-once/read-never” data on the Astoria Data Services post reminded me of some work being done at the University of Maryland’s HPSL lab to improve access to scientific data archives:

Active Data Repository (ADR), a kind of database system for large multidimensional datasets that lets you efficiently access sub-ranges (spatial and temporal) of your data using indexes and query planning and optimization.

DataCutter extends the ADR concept to shared, distributed systems where datasets might be spread across different locations and data-processing resources may also be distributed.

Although existing self-describing file formats already have some features of database systems, the ability to grab arbitrary subsets of data from an archived file without suffering the latency of transferring the entire file to a compute server from tape is generally missing, and would help ease the pain of actually doing something with all this data we’re piling up.

Improving access to archived data sets

Sponsored Guest Articles

Accelerated HPC for Energy Efficiency with AWS and NVIDIA

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA

Improving access to archived data sets

Sponsored Guest Articles

Accelerated HPC for Energy Efficiency with AWS and NVIDIA

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Related Posts

Featured RSS Feed

More News from insideBIGDATA