Petabyte-Scale Active Archive in Private Object Storage

storage

In big data science, storage archives protect massive volumes of research-critical content. This Sponsored Post explores how Scientists at the University of Warsaw (UW) Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) rely on a petabyte-scale active archive built on modern storage technology.

storage

Scientists at the University of Warsaw Interdisciplinary Centre for Mathematical and Computational Modelling rely on a petabyte-scale active archive built on modern storage technology. (Photo: Avere Systems)

An essential component of ICM’s OCEAN research data center supercomputing infrastructure, ICM has 10PB of primary storage and seven petabytes of archive capacity for high-performance computing (HPC) simulations/modelling and big data analytics.

The archive solution integrates an Avere storage gateway that gives systems and scientists seamless access to Western Digital Active Archive object storage[1]. Grzegorz Bakalarski, chief of the ICM division that administers supercomputing infrastructure, says the solution combines NAS functionality to simplify access and the data durability required for cloud-scale environments. “The Avere technology lets us use familiar protocols and tools to connect to the Western Digital object storage. On the archive side, Western Digital’s 15-nines data durability ensures we can protect the valuable and often irreplaceable data generated by OCEAN supercomputers and researchers.”

ICM Deputy Director, Dr. Marek Michalewicz, adds, “One of the challenges we face in enabling big data science is providing sufficiently safe and affordable storage at petabyte scale. The combination of Avere FXT Edge filers and the Western Digital Active Archive System lets us take advantage of object storage efficiencies to support demand.”

One of the challenges we face in enabling big data science is providing sufficiently safe and affordable storage at petabyte scale. `

Bakalarski says that in 2015 when ICM was planning for the OCEAN data center project, object storage was an unfamiliar architecture to both scientists and systems administrators – in Poland there were only a few small object storage installations. “As part of the public procurement process, we stipulated that the archive solution must provide petabyte scale as well as accessibility via NFS and SMB protocols to make the capacity more immediately usable by the entire OCEAN team.”

The primary objective of the OCEAN project was to build out a center dedicated to big data research and expertise, providing HPC-grade infrastructure for data collection and storage, data curation, and advanced data analysis. Bakalarski explains, “In May 2015 we began construction in an open field, building entire 6000 m2 facility from ground to roof –  including power plants, climate control, fire protection, BMS, network systems etc.

November 2015, we began the final stage and installed the huge IT systems: 1100 nodes of CRAY XC40 supercomputer, 10PB ultra-fast primary DDN storage and 400 nodes Big Data Huawei cluster.”

[clickToTweet tweet=”Scientists at the University of Warsaw ICM rely on a petabyte-scale active archive built on storage technology.” quote=”Scientists at the University of Warsaw ICM rely on a petabyte-scale active archive built on modern storage technology.”]

The archive system installation was one of the final deliverables. “When the Avere and Western Digital team arrived, we were able to count time-to-completion in hours. In less than one full day, the team had deployed the archive, offered up a brief workshop, and addressed all of our outstanding questions.”

Today an Avere FXT Edge filer cluster front-ending the Western Digital Active Archive System at the ICM OCEAN supercomputing data center presents some seven petabytes of usable archive capacity. The archive enables reliable access to aging data, ensuring availability for long-term and future research activities. The archive provides capacity for interdisciplinary teams representing some 200 scientists and developers working in areas such as air transportation, bioinformatics, climate modelling, computer-assisted medicine, cosmology, digital libraries, drug discovery, epidemiology, agriculture, high-energy physics, machine learning, material science, neurobiology, social-network analysis, numerical weather prediction, and more.

Learn more about this project here.

[1] Western Digital Active Archive System previously named Western Digital HGST.