Sign up for our newsletter and get the latest HPC news and analysis.

Cray Henry on HPCMP Archive Strategy

hpcmoCray Henry, director of the DoD’s High Performance Computing Modernization Program, has written an article for Government Computer News detailing the current and future plans for the HPCMP’s data archive system.  For those who don’t know, the HPCMP is a program tasked with providing computational and scientific support to over 4,000 scientists and engineers performing DoD-sponsored R&D.  According to the article, the program expects to generate,each year, one-third the amount of data it was accumulated over its 15-year history.  The HPCMP has begun to look at more efficient ways of managing all this data:

Meeting the next five years’ storage requirements will involve increasing the number of machines devoted to storage, improving mechanisms for predicting future storage needs, and possibly integrating algorithms into applications that allow users to catalog and define the storage period for new data. 

and

During the next year, we will institute a number of strategies, such as a revised retention policy, reliance on the users to more proactively manage their data and an upgrade of storage systems, including new storage-density technologies. 

For more info on what the HPCMP is doing to manage its data across the MSRCs and ADCs, read the full article here.

Comments

  1. What you are doing may be impressive in terms of storage, but it is not “archiving.” Archived information is organized, searchable, managed, preserved, and intended to be accessible over time and usable by future generations.

  2. John Leidel says:

    This is most definitely data archiving. The data archives currently present in many [if not all] of the major shared resource centers retains all of your aforementioned qualities. Considering the age of the HPCMP, its simply not mathematically possible for the data archive to be useful to multiple generations. The program is only 15 years old.

  3. With all due respect, having been an onsite at one of the centers listed in the article, I can tell you that Mr. Henry has no clue about the validity of the data he is trying to figure out how to store and pay for. He is a bean counter plain and simple… always was, always will be. For him it always boils down to cost and innovation goes out the window. After looking at one user who had a terrabyte of Netscape caches archived, I realized this is write once, read and validate never.

    Just my $0.02

  4. Patrice – the preservation community certainly has a different norm for the use of the word “archiving” than the HPC and IT communities in general, and I’ll grant you that we aren’t archiving data in the same sense that the presidential libraries archive papers of our past leaders. But our use of the word “archival” in this sense is completely within the bounds of the IT and HPC communities use of the word, and our data is indeed organized, searchable, managed, preserved, and (if you read the linked article you’ll see) completely intended to be accessed in the future. Thus the need to store it. I have 3 PBytes of the stuff in my center, one of those referenced in the article.

  5. Anon – are you still in the PET program? :-)

Resource Links: