Interview: EUDAT to Bring Collaborative Data Infrastructure to ISC'13

Print Friendly, PDF & Email

With the ISC’13 International Supercomputing Conference coming up in June, the time is right to check in with new voices in the European HPC community. This week, I caught up with Damien Lecarpentier, David Manset, and Adam Carter from EUDAT, an organization with a mission to build a Collaborative Data Infrastructure for the EU.

insideHPC: Who is EUDAT and who do you help?

EUDAT Team: EUDAT is a new pan-European data initiative bringing together a unique consortium of 25 partners, including research communities, national data and HPC centers, technology providers, and funding agencies from 13 countries. EUDAT aims to build a sustainable cross-disciplinary and cross-national data infrastructure providing a set of shared services to access and preserve research data.

The services being designed in EUDAT are thus of interest to a broad range of research communities and researchers that lack robust data infrastructures, or that are simply looking for additional storage and/or computing capacities to better access, use, re-use, and preserve their data. Five large research communities have initially joined the project as partners and are contributing to the design of the infrastructure and its services. These communities come from linguistics (CLARIN), solid earth sciences (EPOS), climate sciences (ENES), environmental sciences (LIFEWATCH), and biological and medical sciences (VPH). Other communities have expressed strong interest in EUDAT, and are being associated to the work by sharing their requirements, providing feedback on the services being designed and in some cases by participating to service pilots. These communities come from a large diversity of fields: environmental sciences (ICOS, EMSO, EURO-VO, ENVRI), biomedical sciences (DIXA, ECRIN, BBMRI, INCF), physical sciences (PANDATA, EISCAT), and social sciences and humanities (DARIAH, CESSDA).

Altogether, EUDAT has established contact with 20 major European research communities which are actively involved in the service design process and the shaping of the future infrastructure.

insideHPC: You are a first-time exhibitor at ISC. What will you be showing in your booth this year?

EUDAT Team: This year, EUDAT will be showcasing on its booth the Collaborative Data Infrastructure (CDI) it is developing to support researchers from all fields of science in (1) temporarily storing and sharing data, (2) long-term archiving and curating scientific data and (3) transporting data to computing centers for complex processing. Visitors at the booth will thus be provided with dissemination materials, goodies and detailed information on how to join EUDAT.

insideHPC: EUDAT aims to help users with a “Collaborative Data Infrastructure.” What do you mean by that?

EUDAT Team: The concept of Collaborative Data Infrastructure (CDI) emerged from the work of the High Level Expert Group on Scientific Data (HLEG) and was presented in the Riding the Wave as a possible collaboration framework whereby centers offering community-specific support services to their users could rely on a set of common data services shared between different research communities. Although research communities from different disciplines have different ambitions and approaches – particularly with respect to data organization and content – they also share many basic service requirements. This commonality makes it possible for EUDAT to establish common data services, designed to support multiple research communities, as part of this CDI.

The benefits associated with creating such a collaborative framework are many and will result in better exploitation of synergies: (1) By providing generic services to existing scientific communities, EUDAT will enable these communities to focus a greater part of their effort and investment on services that are discipline-specific; (2) It will also provide individual researchers, smaller communities, and projects lacking tailored data management solutions with access to sophisticated shared services, thus removing the need for large-scale capital investment in infrastructure development.(3) Lastly, the EUDAT research infrastructure will facilitate interoperability between the existing infrastructures, enable multiple users, projects, disciplines, and regions to share data and support data-intensive research collaborations.

insideHPC: EUDAT will conduct two training courses this year. What skills with students learn and how does one register?

EUDAT Team: EUDAT plans to run several training courses this year including training at community events, technical workshops on the EUDAT infrastructure, and a series of online webinars each on different subjects related to EUDAT’s work. We’ll look at things such as sharing data, moving data, making data more useful through the use of metadata and identifiers, and also consider topics such as data-intensive computation and data-centric workflows. In particular we’ll look at these from the point of view of large-scale data infrastructure and the services that are being put together by the EUDAT project.

Our aim – as far as training in the project is concerned – is to be driven by the needs of the end-user communities. For this reason, we plan to run several community-focused training events over the next two years which we will co–locate with existing conferences and workshops. In general these community courses will be aimed at a fairly broad audience and will cover various aspects of the wide area of data. We are already working to tailor training for those communities who have been involved in EUDAT from the outset (CLARIN, ENES, EPOS, Lifewatch and VPH) but we’re also very keen to work with other communities as they join up with EUDAT.

In addition to the community training courses, we will run cross-community training which will be targeted at data centre managers and people who are involved in managing other people’s data. These will be focused more on the details of interacting with the EUDAT infrastructure, and the technologies used. We’ll be advertising all of these courses on the EUDAT website and there will be links here to allow interested people to sign up when the times and locations for these courses are fixed. The community events will also be advertised through the communities’ normal communication channels.

A good opportunity to catch up on our training activities will be the 2nd EUDAT Conference which will be held in Rome on 28-30 October and during which we plan to hold a full day training session.

insideHPC: You have 25 European partners. Are you looking to expand?

EUDAT Team: EUDAT is interested to engage with additional stakeholders, in particular research communities interested in using and developing the services we are offering, but also with everybody willing to contribute to the development of the CDI. Interested organisations and institutions can already join the Consortium as Observers or Associate Partners which are two efficient ways of following and contributing to the work in progress. As a pan-European initiative, EUDAT must have broad coverage – not only in geographical terms but also in terms of scientific representation – and this will be taken into account for future expansion plans.

insideHPC: ISC brings in scientists and engineers from around the globe. Is this what attracted you to participate in the conference?

EUDAT Team: There are a number of things that make ISC compelling. EUDAT’s booth will this year place the focus on industry. Indeed, our aim at ISC is to organize and support networking activities and thus welcome industry representatives from innovative sectors such as Cloud computing, Big Data and Data Analytics. At the booth, visitors will be given a tour of EUDAT services and facilities, and discussions on public-private partnerships as well as long-term collaborations will be encouraged. Given the fairly significant attendance of industry that ISC witnesses every year, it is a very interesting opportunity for EUDAT’s outreach.