TACC builds TeraGrid's largest data store

Print Friendly, PDF & Email

Late last week the Texas Advanced Computing Center (TACC) posted a story about its new data repository that came online at the end of March

With 16 Dell server nodes and 1.2 petabytes of DataDirect Networks storage — twice the space required to hold the entire Netflix DVD catalogue, and four times larger than any current data-collection resource on the TeraGrid — Corral will effortlessly handle the challenges and opportunities of data-driven science.

“We’re ahead of the curve in terms of providing this kind of dedicated data collection and application resource,” Jordan said. “A lot of other sites are doing data collections, but very few sites are providing this kind of universally accessible, unified resource.”

So who is using the resource so far? A few from the story

Herbarium Digitization, The University of Alaska Museum of the North – One of the world’s premier collections of arctic and boreal plants. With support from the National Science Foundation, the Herbarium is taking high-resolution digital photographs of 230,000 pressed plants to capture data about the collection and to make these specimens more accessible for research and education. The images are archived as digital negatives, the most data-intensive file format, preserving all of the data captured by the camera. Making these images publicly available requires four terabytes of rapidly accessible Web storage. Corral will be used to process, manage and store the digital images and other data generated by the project, and will provide high-speed access to this data for researchers and members of the public anywhere in the world. More information available at: http://arctos.database.museum/uam_herb

Institute for Classical Archaeology (ICA), Liberal Arts, The University of Texas at Austin – ICA will use Corral to preserve, protect and disseminate two dynamic datasets to the wider academic community and the public. The first dataset contains information gathered during an intensive field survey of ancient sites in the territory of Metaponto in South Italy where data were documented using GPS and incorporated with remote-sensing imagery into a geographic information system. The second dataset involves excavations in an area of the Greek, Roman and Byzantine city of Chersonesos in Crimea (Ukraine). These spatial and contextual datasets also contain extensive data produced in the course of specialist research into forensic anthropology and ancient agriculture and technology. More information available at: http://www.utexas.edu/research/ica/

But the big boys?

At approximately five terabytes, the University of Alaska’s digital datasets are some of the smaller collections on Corral. The big players, whose funding commitments made Corral possible, are The Center for Predictive Engineering and Computational Sciences (PECOS) project from the Institute for Computational Engineering and Sciences (ICES) at The University of Texas at Austin, and research at the Center for Space Research, also at The University of Texas at Austin [see descriptions below].

These projects anticipate using more than 100 terabytes of storage each and are greatly helped by the fact that Corral is directly linked to TACC’s large computational and visualization systems.