Advanced Protein Prediction Using Deep Learning on Blue Waters Supercomputer

Print Friendly, PDF & Email

Jian Peng

Researchers at NCSA used the Blue Waters Supercomputer and Deep Learning to achieve a breakthrough in protein structure predictions. As published in the Cell Systems journal, the research was conducted by Jian Peng, NCSA Faculty Fellow and Assistant Professor in the Department of Computer Science at Illinois and Yang Liu, a graduate student in the Department of Electrical and Computer Engineering.

Peng’s research proposes to largely explore a more accurate function for evaluating predicted protein structures through his development of the deep learning tool, DeepContact. DeepContact automatically leverages local information and multiple features to discover patterns in contact map space and embeds this knowledge within the neural network. Furthermore, in subsequent prediction of new proteins, DeepContact uses what it has learned about structure and contact map space to impute missing contacts and remove spurious predictions, leading to significantly more accurate inference of residue-residue contacts.

Essentially, this tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent process to assess contact prediction across diverse proteins.

Applying the existing protein structure prediction algorithms and sampling techniques, generates a massive dataset that is then processed and scaled up by the Blue Waters supercomputer. Based on this dataset, Peng hopes to develop a new structure motif-­based deep neural network to assess the structural quality of predictions and to strengthen existing structure prediction algorithms.

Peng’s team, iFold, was top-ranked at the 12th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP12) last year.

We greatly improved the prediction accuracy for protein residue contact,” said Peng, “We believe that the improved contact prediction will further help us get closer to the ultimate goal of protein folding.” When proteins coil and fold into specific three-dimensional shapes they are able to perform their biological function, however, when misfolding happens in proteins, it then causes the proteins to malfunction, resulting in diseases like Alzheimer’s Disease. Peng’s research will use DeepContact to improve models for protein folding, that will facilitate a paradigm shift in protein structure prediction.

DeepContact Integrates Local Information to Improve Contact Prediction

Peng plans to collaborate with with NCSA affiliate, Dr. Matthew Turk’s using NCSA’s high-­performance CPU and GPU resources, expanding on more efficient distributed implementations to accelerate both structure generation and training of deep neural networks.

Earlier this year, NCSA was awarded a $2.7 million grant from the National Science Foundation for deep learning research, which included Peng as a co-PI.

Sign up for our insideHPC Newsletter