Video: Kathy Yelick from LBNL Testifies at House Hearing on Big Data Challenges and Advanced Computing

Print Friendly, PDF & Email

In this video from the House Hearing on Big Data Challenges and Advanced Computing Solutions, Kathy Yelick from LBNL describes why the US needs to accelerate its efforts to stay ahead in AI and Big Data Analytics.

Data-driven scientific discovery is poised to deliver breakthroughs across many disciplines, and the U.S. Department of Energy, through its national laboratories, is well positioned to play a leadership role in this revolution. Driven by DOE innovations in instrumentation and computing, however, the scientific data sets being created are becoming increasingly challenging to sift through and manage.

Big data challenges are often characterized by the 4 Vs: volume (the total size), velocity (the speed at which it is being produced), variability (the diversity of data types) and veracity (noise, errors and other quality issues). Scientific data has all of these, and DOE’s user facilities are a big source of the challenges and opportunities to use large data sets for new discoveries due to increasing data rates, reduced costs of collecting data and total data volumes.

Machine learning represents a promising approach for analytics in science, complementing but not replacing modeling and simulation. In her testimony, Yelick will discuss the emerging role of machine-learning methods that have revolutionized the field of artificial intelligence and may similarly impact scientific discovery. She will also talk about how Berkeley Lab and other national laboratories are applying machine learning tools and techniques to better analyze these data sets and empower scientists to ask and answer increasingly complex questions.

Other key points in her testimony include:

  • Examples of large-scale scientific data challenges in the DOE Office of Science, such as analyzing billions of microbes in complicated communities or millions of supernovae millions of light years away
  • The unique opportunities for machine learning in science, leveraging DOE’s national role as a leader in high performance computing, applied mathematics, user facilities and interdisciplinary team science
  • A vision for the national laboratories that includes foundational research in data science and an interconnected network of experimental and computational facilities to address some of the most challenging data analytics problems in science.

Sign up for our insideHPC Newsletter