Argonne, University Researchers Team to Make Data FAIR for AI

internet connection safe

In a new study, researchers from Argonne National Laboratory, Massachusetts Institute of Technology, University of California San Diego, University of Minnesota, and University of Illinois at Urbana-Champaign have laid out new practices to guide the curation of high energy physics datasets that makes them more FAIR — more findable, accessible, interoperable and reusable.

Data is the lifeblood of scientific research. Collecting, organizing and sharing data both within and across fields drives pivotal discoveries. But making data open and available only answers part of the question about how different scientists — often with very different training — can draw useful conclusions from the same dataset. To promote and guide the cultivation and exchange of data, researchers have developed a set of principles that could make the data more FAIR for both people and machines.

Although these FAIR principles were first published in 2016, researchers are still figuring out how they apply to particular datasets. The FAIR principles were created to serve as goals for data producers and publishers to improve data management and stewardship practices,” said Argonne computational scientist Eliu Huerta, an author of the study. ​The community expects that adhering to these principles will enhance the capabilities of machines to automate the finding and use of data, thereby streamlining the reuse of data for humans.”

The research, published in Nature Scientific Data, demonstrates how to FAIRify an open simulation dataset drawn from particle physics experiments at the CERN Large Hadron Collider. To highlight the interplay between artificial intelligence (AI) research and scientific visualization, this study also provided software tools to visualize and explore this FAIR dataset.

In addition to building FAIR datasets, Huerta and his colleagues also sought to understand the FAIRness of AI models. ​To have a FAIR AI model, we believe you need to have a FAIR dataset to train it on,” said Yifan Chen, the first author of the paper and a graduate student at Illinois and Argonne’s Data Science and Learning division. ​Applying the FAIR principles to AI models will automate and streamline the design and use of those models for scientific discovery.”

Our goal is to shed new light into the interplay of AI models and experimental data and help create a rigorous framework for the development of AI tools to address the biggest challenges in science,” Huerta added.

Ultimately, Huerta said, the goal of FAIRness is to create an agreed-upon set of best practices and methodologies, which will maximize the impact of AI and pave the way for the development of next-generation AI tools.

We’re looking at the entire discovery cycle, from data production and curation, design and deployment of smart and modern computing environments and scientific data infrastructures, and the combination of these to create AI frameworks that greatly advance our understanding of scientific phenomena,” he said.

source: Jared Sagoff, Argonne