Data services

Print Friendly, PDF & Email

So, I thought this pointer from Dan Fay’s blog about M$oft’s project “Astoria” was interesting and potentially relevant for both enterprise and scientific HPC:

The goal of Microsoft Codename Astoria is to enable applications to expose data as a data service that can be consumed by web clients within a corporate network and across the internet. The data service is reachable over HTTP, and URIs are used to identify the various pieces of information available through the service. Interactions with the data service happens in terms of HTTP verbs such as GET, POST, PUT and DELETE, and the data exchanged in those interactions is represented in simple formats such as XML and JSON.

In the HPTC community we’ve long done an incredible job of creating “write once/read never” data. Part of this is due to the packrat nature of science and engineering users. But part is no doubt due to the flat, context-free way in which we store most of the petabytes of technical data we generate. After a few years it’s just easier to regenerate that data than to dig through thousands of files long migrated to tapes and find what’s needed. Perhaps something like Astoria can be part of a new set of thinking about storage moving forward.


  1. […] John’s comment about “write-once/read-never” data on the Astoria Data Services post reminded me of some work being done at the University of Maryland’s HPSL lab to improve access to scientific data archives: […]