Home Fundamentals Research Data Management FAIR Data Principles Metadata Ontologies Data Sharing Data Publications Data Management Plan Version Control & Git Public Data Repositories Persistent Identifiers Electronic Lab Notebooks (ELN) DataPLANT Implementations Annotated Research Context User Journey ARC specification ARC Commander QuickStart QuickStart (Experts) Swate QuickStart Walk-through Best Practices For Data Annotation DataHUB DataPLAN Ontology Service Landscape ARC Commander Manual Setup Git Installation ARC Commander Installation Windows MacOS Linux ARC Commander DataHUB Access Before we start Central Functions Initialize Clone Connect Synchronize Configure Branch ISA Metadata Functions ISA Metadata Investigation Study Assay Update Export ARCitect Manual Installation - Windows Installation - macOS Installation - Linux QuickStart ARCmanager Manual What is the ARCmanager? How to use the ARCmanager Swate Manual Swate Installation Excel Browser Excel Desktop Windows – installer Windows – manually macOS – manually Organization-wide Core Features Annotation tables Building blocks Building Block Types Adding a Building Block Using Units with Building Blocks Filling cells with ontology terms Advanced Term Search Templates File Picker Expert Features Contribute Templates ISA-JSON Workflows Manual CWL Introduction CWL runner installation CWL Examples CWL Metadata DataHUB Manual Overview User Settings Generate a Personal Access Token (PAT) Projects Panel ARC Panel Forks Working with files ARC Settings ARC Wiki Groups Panel Create a new user group Data publications Passing Continuous Quality Control Submitting ARCs with ARChigator Track publication status Use your DOIs Guides ARC User Journey Create your ARC ARC Commander QuickStart ARC Commander QuickStart (Experts) ARCitect QuickStart Annotate Data in your ARC Annotation Principles ISA File Types Best Practices For Data Annotation Swate QuickStart Swate Walk-through Share your ARC Register at the DataHUB DataPLANT account Invite collaborators to your ARC Work with your ARC Using ARCs with Galaxy Recommended ARC practices Syncing recommendation Keep files from syncing to the DataHUB Working with large data files Adding external data to the ARC ARCs in Enabling Platforms Publication to ARC Contribute Swate Templates Knowledge Base Teaching Materials Slides DataPLANT Annotated Research Context Videos Start Your ARC Series Events 2023 Nov: CEPLAS PhD Module Oct: CSCS CEPLAS Start Your ARC Sept: MibiNet CEPLAS Start Your ARC July: RPTU Summer School on RDM July: Data Steward Circle May: CEPLAS Start Your ARC Series Frequently Asked Questions

Data Sharing

last updated at 2022-05-23 The merits of data sharing

Research is a collaborative endeavour that builds on the interaction and efficient knowledge exchange between different researchers. We share research data to get input from peers and elaborate, initiate or expand putative or existing collaborations. Data sharing allows us to save time and resources, e.g., by finding partners to plan or perform investigations together, sharing common pipelines for data analysis or prevent redundant or overlapping investigations, simply by knowing what other peers might already investigate. Sharing research data is thus the key to every successful research project.

However, sharing data is frequently hindered even between researchers of close surroundings. There may be legal reasons, including unclear policies from funding agencies or institutions: "Who am I allowed to share my data with?", "How do I handle data requiring specific precautions for data security or intellectual property rights?". Social or emotional reasons might occur, if researchers might not know about peers interested in their own data: "How do I know, who would like to see my data, if they do not know it exists?" or are afraid to "lose" their data: "Once I share my data, someone else will publish and get credit for it". Recent developments of open science have boosted scientific advancements. However, it is a common misconception of the FAIR principles of data stewardship that accessible data equals public and openly accessible data.

Most researchers however want to share their data and are very aware what data to share with whom, but face technical or even financial issues: "Where and how can I securely share and integrate research data of multiple types, originating from multiple sources?". The sheer amounts of data and data types produced during complex multi-party investigations can easily become overwhelming to handle, costly to store, or limited by storage capacities, especially when proper data protection mechanisms are employed.

The one-stop-shop does not exist

Today many options for sharing and collaborating on data are available and often consciously or incidentally integrated into daily research routines. These include prominent open source or commercial cloud platforms like nextcloud, google drive, dropbox, onedrive and many more. While these are great for synchronous collaboration on typical office data, text files, presentations or simple calculations, they offer limited capacities for data analyses, especially those required for large-scale or complex scientific data. Other solutions specifically designed to accommodate scientific data include electronic lab notebooks to document daily lab routines or platforms like galaxy and omero to analyze and share data from omics or imaging experiments, respectively.

To varying extents, these platforms offer a mix of options for local and remote, asynchronous and synchronous collaboration, often supported by automated version-control to track file version history. Different modes and control of access to the data and different solutions for storage sites exist to suit various aspects of data security and property rights.

For research individuals or groups the data sharing dilemma often lies in the fragmentation of data shares. The more projects collaborating on data of different domains, types or formats and the more people and groups involved, the more platforms are being used, resulting in a fragmented and barely accessible or efficiently manageable data landscape. As a consequence, (un)published data is still mostly shared through conventional routes, such as direct communication between peers via email, instant messaging, virtual and live, personal or group, meetings or presented in more formal contexts such as reports and symposia. As these formats frequently focus on late-stage or final research outputs only, they diminish the chances for collaborations early in an investigation.

Changing the dogma from tool-bound to data-centric: Good data sharing

Trying to find the tool or platform most suitable for the project or data to be shared always depends on the context and is innately erroneous, leading to increased fragmentation. Now, how can the data fragmentation be resolved without siloing everything in one place, i.e. yet another platform? In order to set loose from platform dependency, one could flip the data sharing habits inside-out and switch from the tool perspective towards a data-centric perspective. Instead of trying to enforce the use of a specific platform for data sharing, one could use a data format suitable to and migratable between a wide range of tools and purposes.

In order to support FAIR data sharing, such a data format requires high flexibility to be adoptable to many data types and sources, long-term persistency through independence of (i.e. extension or conversion to) specific data formats and scalability to increasing data amounts. Federated data storage and access allows secure, trusted data sharing with involved parties from different locations across institute borders. Data protection is further granted through geo-redundant backup mechanisms. In combination with a version-control system to follow file change history, the federated authentication and authorization system allows to control data access and contribution for proper crediting and provenance tracking. Data sharing is enabled throughout project lifetime – from idea to unpublished data to publication –, by structuring the data in a defined format packaged with descriptive metadata and licenses to provide technically and legally clear terms of data (re-)use. From there, data publication comes with as little effort as assigning a persistent identifier without any need to adapt the data once the associated manuscript is published.

How does DataPLANT support me in data sharing?

The following table gives an overview about DataPLANT tools and services related to sharing data. Follow the link in the first column for details.

Name Type Tasks on data sharing
ARC
(Annotated Research Context)
Standard Structure:
  • Package data with metadata in a defined format
DataHUB Service Share:
  • Infrastructure-as-code: on-premise solution
  • Federated system to share ARCs
  • Manage who can view or access your ARC
Register with DataPLANT

In order to use the DataHUB and other DataPLANT infrastructure and services, please sign up: with DataPLANT.

DataPLANT Support

Besides these technical solutions, DataPLANT supports you with community-engaged data stewardship. For further assistance, feel free to reach out via our helpdesk or by contacting us directly .
Contribution Guide 📖
✏️ Edit this page