A challenge for data management
Jonathan Taylor reviews the data-management challenges facing neutron sources and other large-scale research facilities as they strive to maximize the scientific and economic impact of their work
Europe’s photon and neutron science community is large. Around 40,000 researchers currently use the continent’s synchrotron and neutron sources, and many of them work at multiple facilities during a single investigation, taking advantage of the different capacities and capabilities of each.
The scale of the data-management task is significant. Neutron scattering experiments can generate up to 50 terabytes of data. But moreover, the changing profile of facility users means that researchers are often pulling together data generated with multiple instruments at several different facilities and combining these data in a single publication.
The real issue is that data from neutron and synchrotron experiments are very diverse. If a single facility can complete 3000–4000 experiments per year across a suite of typically around 30–40 instruments, finding a single data set or data file essentially becomes a challenge.
Giving open access to data in a meaningful way means adhering to the so-called FAIR principles, which state that data should be findable, accessible, interoperable and re-useable.
Many facilities offer commercial access to beam time. In those circumstances, data management and policies must also be compliant with the requirements of commercial visitors. In this regard, there is less commonality between large-scale facilities: each facility has different access mechanisms for industrial users. When commercial players directly pay for access to facilities and support, open-access data policies do not apply. In practice, the exact policy for data arising from commercial access is generated on a case-by-case basis. The fact that European facilities already work within an open-data framework offers great benefits for users. Many scientific journals now require authors to incorporate citable data.
All of these factors – increased data rates, higher experiment complexity, a user community that depends on multiple facilities, and a demand for open access – combine to generate a challenge for data management. This challenge is, however, one that is being taken very seriously at facilities, and in government at the European level.
Jonathan Taylor is group leader for instrument data and acting head of the data-management and software centre at the European Spallation Source.
[Quelle: Physic World, October 2017]
Best example is SINE2020 (Science and Innovation with Neutrons in Europe in 2020), a European R&I project funded by the H2020 programme. SINE2020 is a consortium of 18 partner institutions from 12 countries.
It has two objectives:
1. preparing Europe for the unique opportunities at the European Spallation Source (ESS) in 2020
2. developing the innovation potential of neutron Large Scale Facilities (LSF’s)
The approach includes, among others, improvements to the user-services for sample preparation (with focus on chemical deuteration, crystal growth), sample environment and data treatment at LSF’s, based on standardization across facilities.
Especially data treatment and analysis is the key for maximizing the scientific impact of results and is arguably one of the most important aspects of the scientific workflow for the user community.