TACC’s Stampede2 HPC Helps ID New RNA Molecules for Disease Study

Print Friendly, PDF & Email

Researchers at Ghent University, Amsterdam University of Medicine, National Chiao Tung University, UNSW Sydney, Illumina, and the Baylor College of Medicine have built one of the most comprehensive catalogs of the human transcriptome ever.

By combining complementary sequencing techniques, they have deepened our understanding of the function of known ribonucleic acid (RNA) molecules and discovered thousands of new RNAs, which are a nucleic acid present in living cells whose principal role is to act as a messenger carrying instructions from DNA for controlling the synthesis of proteins.

The research, published in Nature Biotechnology in June 2021, is the result of more than five years of work to further unravel the complexity of the human transcriptome. A better understanding of the human transcriptome is essential to study disease processes and uncover novel genes that may serve as therapeutic targets or biomarkers.

The researchers relied on supercomputing resources to prove that these genes play a role in cells and tissues and are not merely byproducts of other cellular processes.

“Over the past three years we’ve received generous allocations of computing time on the Stampede2 supercomputer,” said Pavel Sumazin, an associate professor in pediatrics–oncology at Baylor College of Medicine and member of the Dan L Duncan Comprehensive Cancer Center. “We used Stampede2 to predict the function of thousands of genes that were never before identified. The validation of these predictions verified that these genes—including thousands of uncharacterized single-exon long non-coding RNAs, which were previously categorized as junk RNAs—are important regulators of key pathways in multiple human cells and tissues.”

Pavel Sumazin, Associ. Professor, pediatrics–oncology, Baylor College of Medicine; Member of the Dan L Duncan Comprehensive Cancer Center

“This analysis was computationally intensive because it required computing distance and delta distance correlations for many billions of gene pairs and triplets, respectively, including the creation of null distributions to evaluate significance,” Sumazin said.

RNAs in All Shapes and Sizes

The transcriptome is the sum of all RNA molecules that are transcribed from the DNA strands that make up our genome. However, there is not a one-for-one relationship.

Firstly, each cell and tissue have unique transcriptomes, with varying RNA production and compositions, including tissue-specific RNAs. Secondly, not all RNAs are transcribed from typical, protein coding genes that eventually produce proteins. Many of our RNA molecules are not used as a template to build proteins. They originate from what once was called junk DNA, or long sequences of DNA with unknown functions.

These non-coding RNAs (ncRNAs) come in all kinds of shapes and sizes: short, long, and even circular RNAs. Many of them even lack the tail of adenine molecules that is typical for protein-coding RNAs.

300 Human Cell and Tissue Types, 3 Sequencing Methods

“There have been other projects to catalog our transcriptome, but the RNA-Atlas project is unique because of the applied sequencing methods,” said Pieter Mestdagh, professor at the Center for Medical Genetics at Ghent University.

Stampede2 supercomputer at the Texas Advanced Computing Center

“Not only did we look at the transcriptome of as many as 300 human cell and tissue types but, most importantly, we did so with three complementary sequencing technologies, one aimed at small RNAs, one aimed at polyadenylated (polyA) RNAs, and a technique called total RNA sequencing.”

This last sequencing technology led to the discovery of thousands of novel non-coding RNA genes, including a novel class of non-polyadenylated single-exon genes and many new circular RNAs.

By combining and comparing the results of the different sequencing methods, the researchers were able to define for every measured RNA transcript, the abundance in the different cells and tissues, whether it has a polyA-tail or not (it appears that for some genes this can differ from cell type to cell type), and whether it is linear or circular.

Moreover, the consortium searched and found important clues in determining the function of some of the ncRNAs. By looking at the abundance of different RNAs in different cell types they found correlations that indicate regulatory functions and could determine whether this regulation happens on the transcription level (by preventing or stimulating transcription of protein-coding genes) or post-transcriptional (e.g. by breaking down RNAs).

All data, analyses, and results are available for download and interrogation in the R2 web portal, enabling the scientific community to implement this resource as a tool for the exploration of non-coding RNA biology and function.

“By combining all data in one comprehensive catalog, we have created a new valuable resource for biomedical scientists around the world studying disease processes,” Sumazin said. “The age of RNA therapeutics is swiftly rising – we’ve all witnessed the impressive creation of RNA vaccines, and already the first medicines that target RNA are used in the clinic. I’m sure we’ll see lots more of these therapies in the next years and decades.”

source: Molly Chiu, Baylor College of Medicine / Faith Singer, TACC