Home
Fundamentals
Research Data Management
FAIR Data Principles
Metadata
Ontologies
Data Sharing
Data Publications
Data Management Plan
Version Control & Git
Public Data Repositories
Persistent Identifiers
Electronic Lab Notebooks (ELN)
DataPLANT Implementations
Annotated Research Context
ARC specification
ARC Commander
Swate
MetadataQuiz
DataHUB
DataPLAN
Ontology Service Landscape
Manuals
ARC Commander
Setup
Git Installation
ARC Commander Installation
Windows
MacOS
Linux
ARC Commander DataHUB Access
Before we start
Central Functions
Initialize
Clone
Connect
Synchronize
Configure
Branch
ISA Metadata Functions
ISA Metadata
Investigation
Study
Assay
Update
Export
ARCitect
Installation - Windows
Installation - macOS
Installation - Linux
QuickStart
QuickStart - Videos
ARCmanager
What is the ARCmanager?
Connect to your DataHUB
View your ARCs
Create new ARCs
Add new studies and assays
Upload files
Add metadata to your ARCs
Swate
QuickStart
QuickStart - Videos
Annotation tables
Building blocks
Building Block Types
Adding a Building Block
Filling cells with ontology terms
Advanced Term Search
File Picker
Templates
Contribute Templates
ISA-JSON
DataHUB
Overview
User Settings
Generate a Personal Access Token (PAT)
Projects Panel
ARC Panel
Forks
Working with files
ARC Settings
ARC Wiki
Groups Panel
Create a new user group
CQC Pipelines & validation
Find and use ARC validation packages
Data publications
Passing Continuous Quality Control
Submitting ARCs with ARChigator
Track publication status
Use your DOIs
Guides
ARC User Journey
Create your ARC
ARCitect QuickStart
ARCitect QuickStart - Videos
ARC Commander QuickStart
ARC Commander QuickStart (Experts)
Annotate Data in your ARC
Annotation Principles
ISA File Types
Best Practices For Data Annotation
Swate QuickStart
Swate QuickStart - Videos
Swate Walk-through
Share your ARC
Register at the DataHUB
DataPLANT account
Invite collaborators to your ARC
Sharing ARCs via the DataHUB
Work with your ARC
Using ARCs with Galaxy
Computational Workflows
CWL Introduction
CWL runner installation
CWL Examples
CWL Metadata
Recommended ARC practices
Syncing recommendation
Keep files from syncing to the DataHUB
Managing ARCs across locations
Working with large data files
Adding external data to the ARC
ARCs in Enabling Platforms
Publication to ARC
Troubleshooting
Git Troubleshooting & Tips
Contribute
Swate Templates
Knowledge Base
Teaching Materials
Events 2023
Nov: CEPLAS PhD Module
Oct: CSCS CEPLAS Start Your ARC
Sept: MibiNet CEPLAS Start Your ARC
July: RPTU Summer School on RDM
July: Data Steward Circle
May: CEPLAS Start Your ARC Series
Start Your ARC Series - Videos
Events 2024
TRR175 Becoming FAIR
CEPLAS ARC Trainings – Spring 2024
MibiNet CEPLAS DataPLANT Tool-Workshops
TRR175 Tutzing Retreat
Frequently Asked Questions
last updated at 2023-08-04
About this guide
In this guide we explore how the ARC can help streamline data flows and project management in enabling platforms.
Before we can start
It helps to be familiar with
- ☑️ the concept of the ARC
- ☑️ the different ISA file types
- ☑️ the DataHUB
- 💡 Note that this guide can only stay at a very abstract level. Feel free to contact us to work out a suitable solution for your platform together.
Enabling Platforms
Central enabling platforms can range from small single-lab services to large instrumentation-based research infrastructures such as core facilities. Platforms offer a specific set of methods and assays as a routine support to a community of researchers. Typical examples with application in plant sciences include platforms offering "omics" technologies (genome, transcriptome, proteome, metabolome), phenotyping, (microscopic) imaging, genetic engineering or plant growth facilities.
Project communication and data flow
For simplicity, in this guide we call researchers, clients and customers approaching the platform "collaborators".
collaborator
platform team
Central platforms interact with many versatile collaborators. Most platforms have established workflows or routines for (i) project initiation, (ii) submission of samples to be assayed, (iii) exchange of and access to generated data. This communication typically includes a ping-pong of meetings and emails to shape the study in mind, elaborate the biological question and hypothesis, define the most suitable method offered by the platform. To effectively process the project, the platform raises requirements for how samples need to be prepared and submitted.
Project management
As a platform you manage a lot of projects in parallel. Keeping these projects up-to-date and coordinated between collaborators and platform team members is challenging.
Here's a few tips to support your project management:
- You can use the wiki associated to the ARC to collect meeting minutes with your collaborators
- You can use the ARC's issue board to coordinate tasks between collaborators, team members, data analysts and others involved
- You can use your established system of identifiers (e.g. for projects, samples) in ISA metadata
- You can also keep naming your ARCs with the same way you are used to name your project folders
Streamlined data exchange
During project collaboration a lot of information including metadata is exchanged.
Here's an idea what your data flow could look like with the ARC:
- Data exchange between collaborator and platform occurs through the ARC shared via the DataHUB. Depending on the phase of the project this involves different data and information.
(1a) The collaborator primarily describes the investigation's goal and study design, annotates the submitted samples with metadata and adds associated protocols. Once available the collaborator receives assayed data and relevant information.
(1b) The platform team receives relevant sample metadata to run the respective assay and complements the ARC with platform-specific assay metadata, protocols and standard operating procedures (SOPs).
- A clone of the ARC is stored locally at the platform.
- From the platform data source, e.g. a workstation attached to your assay device, the generated data can directly be written or transferred to the assay dataset folder of your ARC stored in the file storage
A few tips to streamline your data exchange:
- You can use metadata templates
- to communicate with collaborators and team members, which metadata is crucial for specific assays
- You can also store filled-out templates for your routine measurements as a reusable JSON file
Data storage, handling and transfer
Your platform produces a lot of valuable raw data. Depending on your setup and the type of data you generate, the data is stored on individual machines, hard drives, institutional or internal file shares. For instance, workstations directly associated with measuring devices require quickest data transfer rates or specific security restrictions and write data to proximal data stores rather than a cloud storage.
The DataHUB can help as a platform to centralize data from different sources and function as a geo-redundant backup with tracked changes:
- No matter where you work (office desktop, remote laptop, workstation), you can always sync your current state of the ARC via the DataHUB
- You can follow the evolution of a project via ARC's version control
Track changes and provenance
Different people are involved in the projects passing your platform. From team members to internal and external collaboration partners (collaborators). This makes it sometimes hard to keep track of who contributed how, what, when, why and to manage who needs access to which data.
- The DataHUB facilitates access management across institute boarders
- ARCs document changes and contributions
- The ISA metadata model allows to associate contributors with investigations and studies. List your team members to ensure proper credit for their contributions
Routine computations
Your platform supports researchers with a specific set of methods and assays. You perform routine measurements, follow established protocols and SOPs to generate specific types of samples and data. Aligned with your lab workflows, you also follow computational routines: For the same types of (raw) data you re-use computational steps for processing and analysis.
Reusing computational workflows across projects helps comprehensibility for your collaborators and ensures reproducibility and quality for your data.
The standardized ARC structure helps with routine computations:
- The ARC's simple directory structure itself helps building routines, no matter whether you work with code or licensed software. Across projects, you and your collaborators know, where to find metadata and raw data, where to store processed data and results.
- The ARC facilitates task automation such as quality control and validation within one project or across multiple ARCs covering routine measurements
- Code-based computations can be designed as reusable and reproducible workflows using Common Workflow Language (CWL)
Data publication
At some point you and your collaborator want to publish the data resulting from the project.
There are two major options: You can publish the current version of the ARC and receive a DOI. Some scientific journals require data to be published in domain-specific repositories, specialized for a certain type of assay. For this purpose converters help you export the relevant dataset and metadata into a repository-accepted format.
💡 We are working on converters to read and reshape the relevant data and metadata of your ARC into a format accepted by domain-specific repositories. You can support this by telling us relevant repositories for your type of data or help creating templates.
Preparing your platform to benefit from the ARC
Here we recommend steps that you can take to prepare your platform towards using ARCs.
They are optional and it depends on your platform, whether these are suitable.
☑️ Create an example ARC
- Try to structure a typical experiment run in your platform as an ARC
- Consider what information is required to reproduce the experiment from source-to-data
- What information is provided by the collaborator or by the platform?
- Which protocols (materials and methods) are required?
- At what step and how does the collaborator "hand over" or submit?
☑️ Share your example ARC via the DataHUB
- An example ARC helps collaborators understand the ARC concept and structure with your type of data and metadata
- Where can they find what information (protocols, datasets, results)?
- How can they interact with the ARC (upload, download, edit)
☑️ Create a DataHUB group for your platform
- A group can help you organize all running projects in one place
- You can easily manage access for multiple ARCs
☑️ Design metadata templates for sample annotation
- Let your collaborators know, what information you need from them
- You can either use those templates yourself, when initiating a new ARC for collaboration or let collaborators select the template directly via Swate
☑️ Write a short guide for your collaborators
- Especially for new collaboration partners, it helps to summarize your platforms routine with a short guide.
- How does your platform incorporate ARCs and what are the exact steps for interaction between you and the collaboration partner?
- You could share this guide via this knowledge base or attach it to your example ARC or group in the DataHUB.
☑️ Start packaging your projects as ARCs 🚀
Meet your collaborators in an ARC
You are all set and ready to use ARCs for collaboration in your platform.
Here we try to address a few questions from discussions with platform heads.
Who initiates the ARC: platform or collaborator?
You decide whether you prepare the ARC for you collaborators (scenario A) or meet them half the way (scenario B).
Eventually this depends on different factors, e.g. the type of collaboration you agreed upon or whether or not the collaborators are used to work with the ARC and associated tools.
Following the exemplary scenario A, you could setup the ARC for your collaboration (1), add the relevant studies and assays (2) as well as templates (3) as discussed with them and ask them to complement the required metadata (4) and protocols (5) before you can run the assays and add the dataset (6).
Alternatively, collaborators already working with ARCs could invite you to "their" ARC (exemplary scenario B). They can independently set up the ARC and fill metadata (4) based on your prepared templates (3).
💡 In scenario B the collaborator might invite you to a very large ARC with data not really relevant for your platform-specific collaboration. In this case you might want to exclude irrelevant data or avoid downloading large data when syncing the ARC.
Can I retain my established naming convention for project management and data storage?
Running a central platform, you probably follow an established project management system or naming convention.
This is particularly important for how project folders are named on your platform's data storage. When implementing ARCs, you do not need to change this system.
You can simply name the ARCs the same way you are used to name your project folders.
This is also true for scenario B exemplified above. You can simply fork the collaborator's ARC and rename it according to your system.
💡 You might want to consider requiring your collaborators to name the assays (folder name or assay identifier) according to your project management system.
How can I share ARCs with non-ARC collaborators?
Not all of your collaborators use ARCs or are planning to do so, but you still want to interact with those collaborators in the same routines employed with everyone else. They only need a DataHUB account and they can simply sign in with their existing scientific account.
The DataHUB comes with built-in features that allow interaction with the ARC solely via the web browser without any addional tools.
DataPLANT Support
Besides these technical solutions, DataPLANT supports you with community-engaged data stewardship. For further assistance, feel free to reach out via our
helpdesk
or by contacting us
directly
.