Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Parabricks and SkyScale Raise the Performance Bar for Genomic Analysis

In this guest post, Tim Miller, President of SkyScale, presents insight on performance in genomic analysis when used with a combination of Parabricks software and SkyScale’s Accelerated Cloud.

Genomic Analysis

Tim Miller is President of SkyScale

Modern genomics involves rapid production of vast amounts of raw sequencing data using next-generation sequencing (NGS) and massive computing requirements for conversion of that data into useful results. The most popular toolkit for doing this secondary analysis is GATK4 Best Practices pipeline. Traditionally this work has been done on large numbers of CPUs. Parabricks has changed the paradigm and developed a GPU based solution. Parabricks has adapted GATK4 Best Practice workflows to execute on the NVIDIA Tesla V100 hardware. When executed on the SkyScale Accelerated Cloud with a maximum configuration of 16 V100 GPUs, execution speed for secondary analysis is increased by a factor 40+ and a two-day run is reduced to an hour compared to typical CPU-based configurations.

Genomic Analysis

Figure 1 shows the main elements of the most powerful of SkyScale’s Accelerated Cloud Platforms, with details for a single NVIDIA Tesla V100 GPU module and for the SkyScale 16-GPU configuration.

In this article, we present achievable performance in genomic analysis when used with a combination of Parabricks software and SkyScale’s Accelerated Cloud. The high performance computing platform (HPC) used in these benchmarks incorporates the Tesla™ V100 GPU from NVIDIA® and was developed by One Stop Systems and is provided in the cloud by SkyScale. Configurations with 4, 8, and 16 GPUs are available, either on site from One Stop Systems or in the cloud from SkyScale. Figure 1 shows the main elements of the most powerful of SkyScale’s Accelerated Cloud Platforms, with details for a single NVIDIA Tesla V100 GPU module and for the SkyScale 16-GPU configuration.

The results of using Parabricks’ version of GATK4 on SkyScale with 16 NVIDIA Tesla V100 modules are shown in Figure 2 and are compared to results achieved on Amazon Web Services (AWS). SkyScale provides higher performance using the same number of GPUs, but also scales to larger configurations than available at AWS. In addition to performance, SkyScale provides significantly lower costs than AWS.

The figure also reveals an additional feature of the Parabricks implementation: the flexible SkyScale 16-GPU node can be configured to use:

  • 16 GPUs to analyze a single genome
  • 8 GPUs on each of two genomes
  • 4 GPUs on each of four genomes

For Example: Single genome analysis using all 16 GPUs requires 45 minutes for one genome, but with four GPUs, the time increases to 109 minutes.

In the modern world of genomics where analysis of tens of thousands of genomes is required for research, the cost per genome and the number of genomes per time are critical parameters.

Since four analyses are done in that time, the effective rate of time is one-quarter that of 17 minutes per genome. The same effect is seen with an increase of genomes from 32 to 53 per day by running four analyses at once – exceeding AWS capabilities by over 2x.

Genomic Analysis

Figure 2. (Graph: One Stop Systems.) 

Additional downsides of running genomic analysis in the public cloud are eliminated by using SkyScale’s dedicated ‘bare metal’ platform. With SkyScale, the customer “rents” remote compute resources that are purchased, managed, and provisioned by SkyScale. There is no virtualized environment and no concern for multiple tenants on the same hardware. Data is always secure and private.

In the modern world of genomics, where analysis of tens of thousands of genomes is required for research, the cost per genome and the number of genomes per time are critical parameters. Parabricks adaption of the GATK4 Best Practice workflows running seamlessly on SkyScale’s Accelerated Cloud provides unparalleled price and throughput efficiency to help unlock the power of the human genome.

Tim Miller is President of cloud solutions provider SkyScale.

Leave a Comment

*

Resource Links: