Broad Institute and Intel Advance Genomics

Print Friendly, PDF & Email

This guest post explores how the Broad Institute of MIT and Harvard is working with Intel to accelerate genomics and the field of genomic analysis. 

The Broad Institute of MIT and Harvard in collaboration with Intel, is playing a major role in accelerating genomic analysis.

With the massive growth of genomics data, the collaboration makes use of technology to enable genomics analytics at scale. The latest result is the Intel Select Solutions for Genomics Analytics, a suite of optimized software, along with reference architectures for turnkey configuration, setup, and deployment to run genomics analysis. Broad’s Genomic Analysis Toolkit (GATK) is a key part of Intel Select Solutions for Genomics Analytics.

Genomics

Geraldine Van der Auwera, Broad’s Associate Director of Outreach and Communications, Data Sciences Platform Group, commented, “Our goal is to reduce the challenges that researchers face to generate ever-more meaningful insights from ever-larger sets of genomics data. We’re working with Intel to make the GATK Best Practices pipelines run even faster, at even greater scale, and with easier deployment for genomic research worldwide.”

Broad is an academic, non-commercial entity interested in furthering science and curing disease. The institute, one of the world’s largest producers of human genome data, creates about 24 TB of new data per day and manages more than 50 PB of data.

Genomics received a major boost in 2003 with the first sequencing of the human genome by the Human Genome Project.  This was an ambitious undertaking that spanned 13 years and cost $3 billion.  Now Broad, with help from Intel, is helping to usher in a new era by developing advanced, affordable tools to research the torrents of data that are being created by the genomics community.

The two organizations have collaborated on computing infrastructure and software optimization for years.

The two organizations have collaborated on computing infrastructure and software optimization for years. Last year they launched the Intel-Broad Center for Genomic Data Engineering to simplify and accelerate genomics workflow execution using a variety of advanced tools and techniques – including the popular GATK, a set of over 100 tools and a framework for genomic analysis.

Intel has worked with the Institute on accelerating compute-intensive genomics workloads by developing the genomics kernel library for Intel architecture, as well as optimizing and benchmarking best practices workflows on the latest Intel reference hardware platform.

Several years ago, processing a Whole Genome Sequence (WGS) took 36.12 hours; most recently the GATK- enabled sequencing, running on an Intel Xeon Scalable processor platform, finished the job in 10.8 hours – a 3.3x speedup.

[clickToTweet tweet=”‘We are reaching levels of analysis that were not possible before.’ – Eric Banks, Broad Institute #genomics” quote=”‘We are reaching levels of analysis that were not possible before.’ – Eric Banks, Broad Institute #genomics”]

According to Eric Banks, Senior Director of the Broad Institute’s Data Sciences Program, Intel’s help has been invaluable.  “We are reaching levels of analysis that were not possible before,” he says.

The team is leveraging Intel Select Solutions, verified hardware and software stacks that are optimized for specific software workloads across compute, storage, and network.  Intel Select Solutions for Genomics Analytics include hardware and software components specifically targeted for the workload packaged up in a high performance compute cluster that complies with the Intel Scalable System Framework architecture.

Comments Banks, “The human genome is very large.  What we are doing goes far beyond just processing this data – it requires complex analysis to determine which indications point to an underlying disease.  We now have massive amounts of data that includes many samples from the same disease type that we can study together using a process known as joint analysis.

The first time we did this was several years ago with a cohort of about 3000 samples,” he adds.  “It took about six weeks to analyze the 3K genomes.  Last year we analyzed 15,000 genomes – an order of magnitude larger, running the job on cloud infrastructure, using analysis software based on Intel Select Solutions for Genomic Analytics.  We used only a small percentage of the system’s capacity and had our results in less than two weeks. We are now preparing to analyze 72,000 genomes, which should take a week and a half at this point. There is no way that we could have achieved these results with the previous platform.”

Learn more about Intel Select Solutions for Genomics Analysis.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or specific configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance benchmarks, visit intel.com/benchmarks.

Intel technologies, features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.