Home
Fundamentals
Research Data Management
FAIR Data Principles
Metadata
Ontologies
Data Sharing
Data Publications
Data Management Plan
Version Control & Git
Public Data Repositories
Persistent Identifiers
Electronic Lab Notebooks (ELN)
DataPLANT Implementations
Annotated Research Context
ARC specification
ARC Commander
Swate
MetadataQuiz
DataHUB
DataPLAN
Ontology Service Landscape
ARC Commander Manual
Setup
Git Installation
ARC Commander Installation
Windows
MacOS
Linux
ARC Commander DataHUB Access
Before we start
Central Functions
Initialize
Clone
Connect
Synchronize
Configure
Branch
ISA Metadata Functions
ISA Metadata
Investigation
Study
Assay
Update
Export
ARCitect Manual
Installation - Windows
Installation - macOS
Installation - Linux
QuickStart
QuickStart - Videos
ARCmanager Manual
What is the ARCmanager?
Connect to your DataHUB
View your ARCs
Create new ARCs
Add new studies and assays
Upload files
Add metadata to your ARCs
Swate Manual
QuickStart
QuickStart - Videos
Annotation tables
Building blocks
Building Block Types
Adding a Building Block
Filling cells with ontology terms
Advanced Term Search
File Picker
Templates
Contribute Templates
ISA-JSON
DataHUB Manual
Overview
User Settings
Generate a Personal Access Token (PAT)
Projects Panel
ARC Panel
Forks
Working with files
ARC Settings
ARC Wiki
Groups Panel
Create a new user group
CQC Pipelines & validation
Find and use ARC validation packages
Data publications
Passing Continuous Quality Control
Submitting ARCs with ARChigator
Track publication status
Use your DOIs
Guides
ARC User Journey
Create your ARC
ARCitect QuickStart
ARCitect QuickStart - Videos
ARC Commander QuickStart
ARC Commander QuickStart (Experts)
Annotate Data in your ARC
Annotation Principles
ISA File Types
Best Practices For Data Annotation
Swate QuickStart
Swate QuickStart - Videos
Swate Walk-through
Share your ARC
Register at the DataHUB
DataPLANT account
Invite collaborators to your ARC
Sharing ARCs via the DataHUB
Work with your ARC
Using ARCs with Galaxy
Computational Workflows
CWL Introduction
CWL runner installation
CWL Examples
CWL Metadata
Recommended ARC practices
Syncing recommendation
Keep files from syncing to the DataHUB
Managing ARCs across locations
Working with large data files
Adding external data to the ARC
ARCs in Enabling Platforms
Publication to ARC
Troubleshooting
Git Troubleshooting & Tips
Contribute
Swate Templates
Knowledge Base
Teaching Materials
Events 2023
Nov: CEPLAS PhD Module
Oct: CSCS CEPLAS Start Your ARC
Sept: MibiNet CEPLAS Start Your ARC
July: RPTU Summer School on RDM
July: Data Steward Circle
May: CEPLAS Start Your ARC Series
Start Your ARC Series - Videos
Events 2024
TRR175 Becoming FAIR
CEPLAS ARC Trainings – Spring 2024
MibiNet CEPLAS DataPLANT Tool-Workshops
TRR175 Tutzing Retreat
Frequently Asked Questions
last updated at 2023-10-24
About this guide
In this guide we collect recommendations and thoughts on creating an ARC based on a publication and associated published datasets.
This is not the typical entry into an ARC, but rather retrospective. It might however help to build community-tailored showcases; i.e. showing what a project could look like as an ARC.
Before we can start
This guide assumes you know
ARC setup
💡 the underscore "_" could help to distinguish additional folders ("additional payload") from default ARC folders
Legal
💡 We recommend to focus on open access / CC-BY publications and datasets, unless you explicitly know, whether and how to re-use the data published elsewhere.
ISA - investigation / isa.investigation.xlsx
- Add Title: publication title
- Add Description: publication abstract
- Add Public Release Data: publication online date
- Add People: authors in same order as on publication
- Add First Name, Last Name, Affiliation
- If possible, add Email
- Try to find and add ORCID
- Add Publication
- DOI, Title, Authors, Status = Published
💡 Can be done via ARCitect, ARC Commander or Excel (manually editing the isa.investigation.xlsx file)
ISA - studies and assays
- Identify the "data", i.e. results of experiments.
- In the ARC data is produced by "assays"
- Try to categorize and structure the paper into studies and assays
- studies are typically sample sets that are used as inputs to multiple assays
- Unfortunately samples are not always concisely named in publications. Try to deduce this from supplemental files, tables and figures...
- Cut the materials and methods (MM) section into protocols
- These may be studies/.../protocols or assays/.../protocols
- Store them as markdown files with the MM section as title
- Try to reach a point where one dataset = one assay
- The struggle is, that “datasets” are oftentimes not individually published as such, but rather somehow hidden or integrated in figures and tables (both in the original manuscript and the supplemental files). One needs to find a creative way to extract them from there and store them in the assays' dataset folders.
File names
- Avoid spaces in file names. We recommend to use camelCase or PascalCase for file names
- However, in order to keep track of links and data origin, it is recommended to keep the original name of data files (i.e. if a publisher or repository stores files with spaces).
Original Data
The publication may contain a section "data availability" or "data accession" or similar that references external links (typically a large data repository).
- Try to find and transfer info (sample accessions, IDs, metadata, links, etc.) into the ARC. This would typically be an assay.
💡 There is no clear rule, whether data already published in a public repository should be imported (i.e. copied) into the ARC. Discussion ongoing.
- For showcasing, it makes sense to build a "complete" ARC.
- To minimize data duplication and save storage space, this should be avoided.
DataPLANT Support
Besides these technical solutions, DataPLANT supports you with community-engaged data stewardship. For further assistance, feel free to reach out via our
helpdesk
or by contacting us
directly
.