Turning AI Against Cancer at the Data Science Bowl

Print Friendly, PDF & Email

Today Booz Allen Hamilton and Kaggle today announced the winners of the third annual Data Science Bowl, a competition that harnesses the power of data science and crowdsourcing to tackle some of the world’s toughest problems. This year’s challenge brought together nearly 10,000 participants from across the world. Collectively they spent more than an estimated 150,000 hours and submitted nearly 18,000 algorithms—all aiming to help medical professionals detect lung cancer earlier and with better accuracy.

The Data Science Bowl shows that the power of collective ingenuity, data science and advanced analytics can be harnessed to tackle society’s toughest challenges like eradicating cancer,” said Josh Sullivan, senior vice president at Booz Allen. “This year’s complex problem—improving the accuracy of lung cancer screening—required the diversity of perspectives and approaches that only a crowd-sourced challenge like the Data Science Bowl can provide. We look forward to advancing these solutions and in the fight against cancer.”

2017 Data Science Bowl winners include:

  • First Place: Liao Fangzhou and Zhe Li, two researchers from China’s Tsinghua University who have no formal medical background but were able to apply their analytics skills to an unfamiliar but challenging area of research.
  • Second Place: Julian de Wit and Daniel Hammack, both software and machine learning engineers based in the Netherlands. Julian came in third in the Data Science Bowl 2016.
  • Third Place: Team Aidence, members of which work for a Netherlands-based company that applies deep learning to medical image interpretation.

Lung cancer is the most common type of cancer worldwide, affecting nearly 225,000 people each year in the United States alone. Low-dose computed tomography (CT) is a breakthrough technology for early detection, with the potential to reduce lung cancer deaths by 20 percent. But, the technology must overcome a relatively high false positive rate.

Using anonymized high-resolution lung scans in one of the largest data sets to be made publicly available, provided by the National Cancer Institute (NCI), participants created algorithms that can improve lung cancer screening technology. The participants created algorithms that can accurately determine when lesions in the lungs are cancerous and dramatically decrease the false positive rate of current low-dose CT technology.

This is one of the most important competitions Kaggle has ever hosted,” said Anthony Goldbloom, CEO, Kaggle. “Recent breakthroughs in deep neural networks may make it feasible to diagnose lung cancer from CT scans with higher accuracy than previously possible. The interest in this year’s Data Science Bowl has been unprecedented for a competition of this size. The results are incredibly promising.”

Top teams will present their winning solutions next week at the 2017 GPU Technology Conference in San Jose, California, hosted by NVIDIA, a Data Science Bowl sponsor.

Reducing the false positive rate of low-dose CT scans is a critical step in improving the accuracy of CT screening of lung cancer and having a positive impact on public health,” said Keyvan Farahani, Program Director, National Cancer Institute, who provided scientific guidance regarding the competition’s design and datasets. “NCI is committed to working closely with the scientific community, the Food and Drug Administration, and other stakeholders to utilize this year’s top-ranking solutions to further advance the field of lung cancer screening.”

Sign up for our insideHPC Newsletter