Deep Learning at scale for the construction of galaxy catalogs

Print Friendly, PDF & Email

A team of scientists is now applying the power of artificial intelligence (AI) and high-performance supercomputers to accelerate efforts to analyze the increasingly massive datasets produced by ongoing and future cosmological surveys.

In a new study, researchers from NCSA and Argonne have developed a novel combination of deep learning methods to provide a highly accurate approach to classifying hundreds of millions of unlabeled galaxies. The team’s findings were published in Physics Letters B.

“The NCSA Gravity Group initiated, and continues to spearhead, the use of deep learning at scale for gravitational wave astrophysics. We have expanded our research portfolio to address a computational grand challenge in cosmology, innovating the use of several deep learning methods in combination with high-performance computing (HPC),” said Eliu Huerta, NCSA Gravity Group Lead. “Our work also showcases how the interoperability of NSF and DOE supercomputing resources can be used to accelerate science.”

Deep learning research has rapidly become a booming enterprise across multiple disciplines. Our findings show that the convergence of deep learning and HPC can address big-data challenges of large-scale electromagnetic surveys. This research is part of a multidisciplinary program at NCSA to push the boundaries of AI and HPC in scientific research,” added Asad Khan, a graduate student at the NCSA Gravity Group and lead author of this study.

Supported by an ALCF Data Science Program award, the team used the SDSS datasets produced by the Galaxy Zoo campaign to train neural network models to classify galaxies in the Dark Energy Survey (DES) that overlap the footprint of both surveys. The method’s ability to identify spiral and elliptical galaxies was found to be 99.6 percent accurate.

Using the millions of classifications carried out by the public in the Galaxy Zoo project to train a neural network is an inspiring use of the citizens science program,” said Elise Jennings, ALCF computer scientist. “This exciting research also sheds light on the inner workings of the neural network, which clearly learns two distinct feature clusters to identify spiral and elliptical galaxies.”

The team’s innovative framework lays the foundations to exploit deep transfer learning at scale, data clustering and recursive training to produce large-scale galaxy catalogs in the Large Synoptic Survey Telescope (LSST) era.

We’re excited to work with the team at NCSA and Argonne as well as the researchers who drove the original Galaxy Zoo effort to pursue this important area of scientific discovery,” said Tom Gibbs, manager of developer relations at NVIDIA. “Using these new methods, we’re taking an important step to understanding the mystery of dark energy.”

Highlights of this study include:

  • The first application of deep transfer learning using disparate datasets for galaxy classification. The team used deep transfer learning to transfer knowledge from Xception, a state-of-the-art neural network model for image classification trained with the ImageNet dataset, to classify SDSS galaxy images. Transfer learning between similar datasets, such as images of human faces, has been traditionally used in computer science literature. In stark contrast, this study uses a pre-trained model for real-world object recognition and then transfers its knowledge to classify galaxies.
  • The researchers developed open-source software stacks to extract galaxy images from the SDSS and DES surveys at scale using the NCSA’s Blue Waters supercomputer. Deep learning algorithms were prototyped and trained using NVIDIA GPUs in the Bridges supercomputer at the Pittsburgh Supercomputing Center through the National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE). Finally, deep transfer learning was combined with distributed training to reduce the training stage of the Xception model with galaxy image datasets from five hours to just eight minutes using ALCF supercomputing resources.
  • The researchers used deep neural network classifiers to label over 10,000 unlabeled DES galaxies that have not been observed in previous surveys. The neural network model models are then turned into feature extractors to show that these unlabeled datasets can be clustered according to their morphology, forming two distinct datasets.
  • ALCF researchers created a visualization to show the output of the penultimate layer of a deep neural network during training as it is learning to classify galaxies as spiral or elliptical.

Source: NCSA

Sign up for our insideHPC Newsletter