DDN Storage Speeds Genome Sequencing at University of Miami

miamiToday DDN announced that the University of Miami’s Center for Computational Science (CCS) has deployed high-performance, DDN GS12K scale-out file storage to speed scientific discoveries and boost collaboration with researchers around the world. CCS maintains one of the largest centralized academic cyberinfrastructures in the country, which fuels vital and critical discoveries in Alzheimer’s, Parkinson’s, gastrointestinal cancer, paralysis and climate modeling as well as marine and atmospheric science research.

More than 2,000 internal researchers and a dozen expert collaborators across academic and industry sectors worldwide work together in workflow management, data management, data mining, decision support, visualization and cloud computing. To streamline workflows and keep pace with data-intensive discovery demands, CCS has integrated its High Performance Computing (HPC) environment with data capture and analytics capabilities so that data can move transparently between research steps.

To simplify data capture and analysis, CCS relies on DDN’s powerful and versatile GS12K storage to handle bandwidth-driven workloads while serving very high IOPS demand resulting from intense user interaction. As a result, the center now captures, stores and distributes massive amounts of data generated from multiple scientific models running different simulations on 15 Illumina HiSeq sequencers simultaneously on DDN storage. Thanks to DDN, the center has reduced its number-crunching time for genome mapping and SNP calling from 72 to 17 hours.

“DDN enabled us to analyze thousands of samples for the Cancer Genome Atlas, which amounts to nearly a petabyte of data,” said Dr. Nicholas Tsinoremas, director of the Center for Computational Sciences at the University of Miami. “Having a robust storage platform like DDN is essential to driving discoveries such as our recent study that revealed a link between certain viruses and gastrointestinal cancers. Previously, we couldn’t have done that level of computation.”

In addition to providing significant storage processing power to meet both high I/O and interactive processing requirements, CCS needed a flexible file system that could support large parallel and short serial jobs. Additionally, the center had to address “data in flight” challenges resulting from major data surges during analysis, which often caused a 10x spike in storage. DDN’s ability to adapt easily to all of CCS’ requirements enabled the center to leverage one centralized storage platform for all its needs while scaling seamlessly without adding a layer of complexity.

Moreover, DDN’s best-in-class performance for genomics assembly, alignment and mapping enables CCS to support all its application needs easily, including the use of BWA and Bowtie for initial mapping as well as SamTools and GATK for variant analysis and SNP calling.

Our arrangement is to share data or make it available to anyone asking, anywhere in the world,” added Tsinoremas. “Now we have the storage versatility to attract researchers from both within and outside the HPC community. With DDN, we’re well positioned to generate, analyze and integrate all types of research data to drive major scientific discoveries and breakthroughs.”

Download the insideHPC Guide to Genomics