New devices are generating terabytes (TB) of data per day and thus the requirement to store this data in real time and make it available to downstream workloads is critical for most organizations. A computing and storage infrastructure needs to be implemented that gives researchers the tools to perform their work, without having to wait for data to be delivered or acted upon. The increasing amount of data that is produced calls for new collaborations between hardware and software vendors to create these much needed and balanced solutions.
While simulations continue to be optimized for the latest and greatest computing systems, the storage subsystem must also be optimized for the workflows for genomics and life science research. Parallel file systems and the ability to spread the storage and retrieval of these massive amounts of data needs to incorporate and be knowledgeable of clients, servers, network fabric, and the type of data that is being used. Parallel file systems can deliver an order of magnitude performance increase compared to standard NFS based systems.
Organizations such as The Tohoku Medical Megabank Organization (ToMMo) at Tohoku University is making huge strides in genomics and human health and is taking advantage of the combination of the latest storage hardware and parallel file system software in order to perform research on hundreds of thousands of citizens in the area for epidemiological research studies.
The combination of the latest generation of processors and coprocessors, parallel file system software and state-of-the art hardware provides a powerful platform for the ingest, analysis, search, collaboration and archiving of massive amounts of genomic data. This whitepaper details how these technologies are needed and being used in the life science area to speed the understanding of genomics at breakneck speed.
Download the whitepaper.