SKA and CERN Sign Big Data Agreement

Print Friendly, PDF & Email

Dr. Fabiola Gianotti, CERN Director-General, and Prof. Philip Diamond, SKA Director-General, signing a cooperation agreement between the two organizations on Big Data

Today the SKA Organization and CERN signed an agreement formalizing their growing collaboration in the area of extreme-scale computing. The agreement establishes a framework for collaborative projects that addresses joint challenges in approaching Exascale computing and data storage, and comes as the Large Hadron Collider will generate even more data in the coming decade and the Square Kilometre Array radio telescope is preparing to collect a vast amount of scientific data as well.

“The signature of this collaboration agreement between two of the largest producers of science data on the planet shows that we are really entering a new era of science worldwide”, said Prof. Philip Diamond, SKA Director-General. “Both CERN and SKA are and will be pushing the limits of what is possible technologically, and by working together and with industry, we are ensuring that we are ready to make the most of this upcoming data and computing surge.”

Around the world, countries are engaged in efforts to cope with a leap in the demands of Information and Communication Technology. The Square Kilometre Array (SKA) project, the world’s largest radio telescope when built, and CERN’s Large Hadron Collider (LHC), the world’s largest particle accelerator, famous for discovering the Higgs Boson, will contribute in driving the required technological developments.

The LHC computing demands are tackled by the Worldwide LHC computing grid which employs more than half a million computing cores around the globe interconnected by a powerful network. As our demands increase with the planned intensity upgrade of the LHC we want to expand this concept by using common ideas and infrastructure, into a scientific cloud. SKA will be an ideal partner in this endeavour.” said Prof. Eckhard Elsen, CERN Director of Research and Computing.

CERN and SKA have identified the acquisition, storage, management, distribution, and analysis of scientific data as particularly burning topics to meet the technological challenges.

In the case of the SKA, it is expected that phase 1 of the project – representing approximately 10% of the whole SKA – will generate around 300 PB (petabytes) of data products every year. This is ten times more than today’s biggest science experiments.

CERN has just surpassed the 200 PB limit for raw data collected by the experiments at the LHC over the past seven years. A layered (tiered) system provides for data storage in the remote centers. The High-Luminosity LHC is estimated to exceed this level every year.

This in itself will be a challenge for both CERN and SKA given the step change in the amounts of data we will have to handle in the next 5-10 years”, explains Miles Deegan, High-Performance Computing Specialist for the SKA. “Transferring an average dataset will take days on the SKA’s ultra-fast fibre optic networks, which are 300 times faster than your average broadband connection, so storing or even downloading this data at home or even at your local university is clearly impractical.”

As is already the case at CERN, SKA data will also be analyzed by scientific collaborations distributed across the planet. There will be common computational and storage resource needs by both institutions and their respective researchers, with a shared challenge of taking this volume of data and turning them into science that can be published, understood, explained, reproduced, preserved and presented.

Processing such volumes of complex data to extract useful science is an exciting challenge that we face”, adds Antonio Chrysostomou, Head of Science Operations Planning for the SKA. “Our aim is to provide that processing capability through an alliance of regional centers located across the world in SKA member countries. Using cloud-based solutions, our scientific community will have access to the equivalent of today’s 35 biggest supercomputers to do the intensive processing needed to extract scientific results. In short, we need to fundamentally change how science is done.”

CERN has proposed the concept of the Federated Open Science Cloud with other EIROForum members. This agreement is an important step in this direction.” said Ian Bird, responsible at CERN for the World-wide LHC Computing Grid. “Essentially, we will provide a giant cloud-based, Dropbox-like, facility to science users around the world, where they will be able to not only access incredibly large files, but will also be able to do extremely intensive processing on those files to extract the science.”

As part of the agreement, CERN and SKA will hold regular meetings to monitor progress and discuss the strategic direction of their collaboration. They will organize collaborative workshops on specific technical areas of mutual interest and propose demonstrator projects or prototypes to investigate concepts for managing and analyzing Exascale data sets in a globally distributed environment. The agreement also includes the exchange of experts in the field of Big Data as well as joint publications.

Sign up for our insideHPC Newsletter