Argonne Publishes AI for Science Report

Argonne National Lab has published a comprehensive AI for Science Report based on a series of Town Hall meetings held in 2019. Hosted by Argonne, Oak Ridge, and Berkeley National Laboratories, the four town hall meetings were attended by more than 1,000 U.S. scientists and engineers. The goal of the town hall series was to examine scientific opportunities in the areas of artificial intelligence (AI), Big Data, and high-performance computing (HPC) in the next decade, and to capture the big ideas, grand challenges, and next steps to realizing these opportunities.

In this report and in the Department of Energy (DOE) laboratory community, we use the term “AI for Science” to broadly represent the next generation of methods and scientific opportunities in computing, including the development and application of AI methods (e.g., machine learning, deep learning, statistical methods, data analytics, automated control, and related areas)to build models from data and to use these models alone or in conjunction with simulation and scalable computing to advance scientific research. TheAI for Science town hall discussions focused on capturing the transformational uses of AI that employ HPC and/or data analysis, leveraging data sets from HPC simulations or instruments and user facilities, and addressing scientific challenges unique to DOE user facilities and the agency’s wide-ranging fundamental and applied science enterprise.

The town halls engaged diverse science and user facility communities, with both discipline-and infrastructure-specific representation. The discussions, captured in the 16 chapters of this report, contain common arcs revealing classes of opportunities to develop and exploit AI techniques and methods to improve not only the efficacy and efficiency of science but also the operation and optimization of scientific infrastructure.

The community’s experience with machine learning (ML), HPC simulation, data analysis methods, and the consideration of long-term science objectives revealed a growing collection of unique and novel opportunities for breakthrough science, unforeseeable discoveries, and more powerful methods that will accelerate science and its application to benefit the nation and, ultimately, the world.

New AI techniques will be indispensable to supporting the continued growth and expansion of DOE science infrastructure from ESnet to new light sources to exascale systems, where system scale and complexity demand AI-assisted design, operation, and optimization. Toward this end, novel AI approaches to experiment design, in-situ analysis of intermediate results, experiment steering, and instrument control systems will be required.

DOE’s co-design culture involving teams of scientific users, instrument providers, mathematicians and computer scientists can be leveraged to develop new capabilities and tools such that they can be readily applied across the agency’s (and indeed the nation’s) diversity of instruments, facilities, and infrastructure. This report captures some early opportunities in this direction, but much more needs to be explored.From chemistry to materials sciences to biology, the use of ML and deep learning (DL) techniques opens the potential to move beyond today’s heuristics-based experimental design and discovery to AI-enhanced strategies of the future.

Early use of generative models in materials exploration suggests that millions of possible materials could be identified with desired properties and functions and evaluated with respect to synthesizability. The synthesis and testing stages necessary for such scales will in turn rely on ML and adaptive, autonomous robotic control of high-throughput synthesis and testing lines, creating “self-driving” laboratories.

The same complexity challenge and concomitant need to move from human-in-the-loop to AI-driven design, discovery, and evaluation also manifests across the design of scientific workflows, optimization of large-scale simulation codes, and operation of next generation instruments.

Exascale systems and new scientific instruments, such as upgraded light sources and accelerators, are increasing the velocity of data beyond the capabilities of existing instrument data transmission and storage technologies. Consequently, real-time hardware is needed to detect events and anomalies in order to reduce the raw instrument data rates to manageable levels. New ML, including DL, capabilities will be critically important in order to fully exploit these instruments, replacing pre-programmed hardware event triggers with algorithms that can learn and adapt, as well as discover unforeseen or rare phenomena that would otherwise be lost in compression.

In recent years, the success of DL models has resulted in enormous computational workloads for training AI models, representing a new genre of HPC resource demand. Here, the use of AI techniques to optimize learning algorithms and implementation will be necessary with respect to both the energy cost of large-scale computation and to the exploitation of new computing hardware architectures. AI in HPC has already taken the form of neural networks trained as surrogates to computational functions (or even entire simulations), demonstrating the potential for AI to provide non-linear improvements of multiple orders of magnitude in time-to-solution for HPC applications (and, coincidentally, reductions in their cost).L,

Similarly, scientific infrastructure—accelerators, light sources, networks, computation and data resources—have reached scales and complexities that require the use of ML for tasks such as anomaly detection in operational data (e.g., for cybersecurity). Moving from today’s fixed rules-based operating procedures to the use of AI algorithms that factor real-time analysis will be indispensable for optimizing performance and energy use of increasingly complex, large-scale infrastructures. New DL methods are required to detect anomalies and optimize operating parameters, with additional potential to predict failures as well as to discover new optimization algorithms and novel mechanical or externally induced threats.

The DOE computing facilities such as Summit, Perlmutter, Aurora and Frontier will simultaneously support the development of existing large-scale simulations, new hybrid HPC models with AI surrogates, and the exploration of new types of generative models emerging from multimodel data streams and sources. Future systems envisioned over the next decade may need to support even richer workloads of traditional HPC and next-generation AI-driven scientific models.AI will not magically address these and the other opportunities and challenges discussed in this report. Much work will be required within all science disciplines, across science infrastructure, and in the theory, methods, software, and hardware that underpin AI methods.

The use of AI to design and tune hardware systems—whether exascale workflows, national networks, or smart energy infrastructure—will require the development and evaluation of a new generation of AI frameworks and tools that can serve as building blocks that can be adapted and reused across disciplines and across heterogeneous infrastructure.

Bringing AI to any specific domain—whether it is nuclear physics or biology and life sciences—will demand significant effort to incorporate domain knowledge into AI systems, quantify uncertainty, error, and precision, and appropriately integrate these new mechanisms into state-of-the-art computational and laboratory systems.The overflowing attendance at the AI for Science town halls, the level of enthusiasm and the engagement of attendees, the number of spontaneous AI projects throughout every scientific discipline, and the commitment to growth in this area at the nation’s premiere laboratories all combine to indicate that the DOE scientific community is ready to explore and further the transformational potential of AI through 2030 and beyond.

Source: Argonne

Download the Report

Sign up for our insideHPC Newsletter