Logo

1 Basic Information:

1.1 What is the project name or acronym?

Who is the most likely to benefit from the data?

1.3 Other DMP Metadata

1.4 Please select from the following options

2. What kind of data will you handle?

2.1 Where will you submit your data as endpoints?

3. How much data will you likely to generate?

GB
GB






4. Are any of the following standards relevant to your project?

4.1 Will you adhere to any high level metadata submission standards?

4.2 Project data will be published:

4.4 Will you follow national standards or archived in national infrastructures?

5. Do you intend to use data visualization in your project?

























The project aim should be a apart of a sentence.

Example 1: aims at creating a computational model of carbon and water flow within a whole plant architecture


Example 2: aims at generating data management plan with minimal effort and making the data as open as possible

The project object = target.

Example 1: carbon and water flow in plants


Example 2: data management plan

Here is the space for additional sentence.

Example 1: Industry, politicians and students can also use the data for different purposes.


Example 2: The data acquired in the project can be used by a wide range of people with different purpose.

Information in this section is only used in DMP metadata and not used in the document

Data officers are also known as data stewards and curator.

software that legally remains the property of the organization, group, or individual who created it.

User-defined template

You can click the dotted box to start editing.
Click the grey buttons to reuse templates.
Click submit when you finished.



Data Management Plan of the H2020 Project $_PROJECT


Action Number:

$_FUNDINGPROGRAMME

Action Acronym:

$_PROJECT

Action Title:

$_PROJECT

Creation Date:

$_CREATIONDATE

Modification Date:

$_MODIFICATIONDATE

DMP version:

$_DMPVERSION


1    Introduction

#if$_EU The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. #endif$_EU To best profit from open data, it is necessary not only to store the data but to make it Findable, Accessible, Interoperable, and Reusable (FAIR).#if$_PROTECT We support open and FAIR data, however, we also consider the need to protect individual data sets. #endif$_PROTECT

The aim of this document is to provide guidelines on the principles of data management in the $_PROJECT and to specify which type of data will be stored, this will be achieved by using the responses to the EU questionnaire on Data Management Plan (DMP) as a DMP document.

The detailed DMP states how data will be handled during and after the project. The $_PROJECT DMP is prepared according to the Horizon 2020 and Horizon Europe online manual. #if$_UPDATE It will be updated/its validity checked during the $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. #endif$_UPDATE

2    Data Management Plan EU Template

2.1    Data Summary

What is the purpose of the data collection/generation and its relation to the objectives of the project?

The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analysis information. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section.

What types and formats of data will the project generate/collect?

The $_PROJECT will collect and/or generate the following types of raw data : $_GENETIC, $_GENOMIC, $_TRANSCRIPTOMIC, $_RNASEQ, $_METABOLOMIC, $_PROTEOMIC, $_PHENOTYPIC, $_TARGETED, $_IMAGE, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data will also be processed and modified using analytical pipelines, which may yield different results or include ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be tracked in the DataPLANT ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive these resources (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise in the DataPLANT consortium#endif$_DATAPLANT.

Will you re-use any existing data and how?

The project builds on existing data sets and relies on them. #if$_RNASEQ|$_GENOMIC For example, without a proper genomic reference it is very difficult to analyze next-generation sequencing (NGS) data sets.#endif$_RNASEQ|$_GENOMIC It is also important to include existing data-sets on the expression and metabolic behavior of the $_STUDYOBJECT, and on existing background knowledge. #if$_PARTNERS of the partners. #endif$_PARTNERS Genomic references can be gathered from reference databases for genomes/ and sequences, like the US National Center for Biotechnology Information: NCBI, European Bioinformatics Institute: EBI; DNA Data Bank of Japan: DDBJ. Furthermore, prior 'unstructured' data in the form of publications and data contained therein will be used for decision making.

What is the origin of the data?

Public data will be extracted as described in the previous paragraph. For the $_PROJECT, specific data sets will be generated by the consortium partners.

Data of different types or representing different domains will be generated using unique approaches. For example:

#if$_PREVIOUSPROJECTS

Data from previous projects such as $_PREVIOUSPROJECTS will be considered.

#endif$_PREVIOUSPROJECTS

What is the expected size of the data?

We expect to generate $_RAWDATA GB of raw data and up to $_DERIVEDDATA GB of processed data.

To whom might it be useful ('data utility')?

The data will initially benefit the $_PROJECT partners, but will also be made available to selected stakeholders closely involved in the project, and then the scientific community working on $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also use the data after publication. The data will be disseminated according to the $_PROJECT's dissemination and communication plan, #if$_DATAPLANT which aligns with DataPLANT platform or other means#endif$_DATAPLANT

2.2    FAIR data

Making data findable, including provisions for metadata

Are the data produced and/or used in the project discoverable with metadata, identifiable and locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers such as Digital Object Identifiers)?

All datasets will be associated with unique identifiers and will be annotated with metadata. We will use Investigation, Study, Assay (ISA) specification for metadata creation. The $_PROJECT will rely on community standards plus additional recommendations applicable in the plant science, such as the #if$_PHENOTYPIC #if$_MIAPPE MIAPPE (Minimum Information About a Plant Phenotyping Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen),#endif$_MIMARKSSPECIMEN #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey),#endif$_MIMARKSSURVEY #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG #endif$_GENOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI #endif$_IMAGE #if$_PROTEOMIC #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX #endif$_PROTEOMIC These specific standard unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, minimal cross-domain annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered to. #endif$_OTHERSTANDARDS

What naming conventions do you follow?

Data variables will be allocated standard names. For example, genes, proteins and metabolites will be named according to approved nomenclature and conventions. These will also be linked to functional ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by humans. Plant names will include traditional names, binomials, and all strain/cultivar/subspecies/variety identifiers.

Will search keywords be provided that optimize possibilities for re-use?

Keywords about the experiment and consortium will be included, as well as an abstract about the data, where useful. In addition, certain keywords can be auto-generated from dense metadata and its underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized DataPLANT ontologies that are provided where the ontology does not yet include such variables. #endif$_DATAPLANT

Do you provide clear version numbers?

To maintain data integrity and facilitate reanalysis, data sets will be allocated version numbers where this is useful (e.g. raw data must not be changed and will not get a version number and is considered immutable). #if$_DATAPLANT This is automatically supported by the ARC Git DataPLANT infrastructure. #endif$_DATAPLANT

What metadata will be created? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how.

We will use Investigation, Study, Assay (ISA) specification for metadata creation. #if$_RNASEQ|$_GENOMIC For specific data (e.g., RNASeq or genomic data), we use metadata templates from the end-point repositories. #if$_MINSEQE The Minimum Information About a Next-generation Sequencing Experiment (MinSEQe) will also be used. #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC The following metadata/ minimum informatin standards will be used to collect metadata: #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen),#endif$_MIMARKSSPECIMEN #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey),#endif$_MIMARKSSURVEY #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG #endif$_GENOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI #endif$_IMAGE #if$_PROTEOMIC #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX #endif$_PROTEOMIC #if$_METABOLOMIC #if$_METABOLIGHTS Metabolights submission compliant standards will be used for metabolomic data where this is acccepted by the consortium partners.#issuewarning some Metabolomics partners considers Metabolights not an accepted standard.#endissuewarning #endif$_METABOLIGHTS #endif$_METABOLOMIC As a part of plant research community, we use #if$_MIAPPE MIAPPE for phenotyping data in the broadest sense, but we will also be rely on #endif$_MIAPPE specific SOPs for additional annotations #if$_DATAPLANT that consider advanced DataPLANT annotation and ontologies. #endif$_DATAPLANT

Making data openly accessible

Which data produced and/or used in the project will be made openly available as the default? If certain datasets cannot be shared (or need to be shared under restrictions), we explain why, clearly separating legal and contractual reasons from voluntary restrictions.

By default, all data sets from the $_PROJECT will be shared with the community and made openly available. However, before the data are released, all will be provided with an opportunity to check for potential IP (according to the consortium agreement and background IP rights). #if$_INDUSTRY This applies in particular to data pertaining to the industry. #endif$_INDUSTRY IP protection will be prioritized for datasets that offer the potential for exploitation.

Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out.

How will the data be made accessible (e.g. by deposition in a repository)?

Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies:

#if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

#if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

#if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

#if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

#if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

#if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

#if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP

For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI). #if$_DATAPLANT Whole datasets will also be wrapped into an ARC with allocated DOIs. The ARC and the converters provided by DataPLANT will ensure that the upload into the endpoint repositories is fast and easy. #endif$_DATAPLANT

What methods or software tools are needed to access the data?

#if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. #endif$_PROPRIETARY

#if!$_PROPRIETARY No specialized software will be needed to access the data, just a modern browser. Access will be possible through web interfaces. For data processing after obtaining raw data, typical open-source software can be used. #endif!$_PROPRIETARY

#if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC commander, arcCommander, and DataPLAN #endif$_DATAPLANT

Is documentation about the software needed to access the data included?

#if$_DATAPLANT DataPLANT resources are well described, and their setup is documented on a github project guide is provided on the GitHub project pages. #endif$_DATAPLANT All external software documentation will be duplicated locally and stored near the software.

Is it possible to include the relevant software (e.g. in open-source code)?

As stated above, the $_PROJECT will use publicly available open-source and well-documented certified software #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY.

Where will the data and associated metadata, documentation and code be deposited? Preference should be given to certified repositories that support open access, where possible.

As noted above, specialized repositories will be used for common data types. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI).#if$_DATAPLANT The Whole datasets will also be wrapped into an ARC with allocated DOIs.#endif$_DATAPLANT.

Have you explored appropriate arrangements with the identified repository?

The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. #if$_DATAPLANT , and this has been confirmed for data associated with DataPLANT #endif$_DATAPLANT. #issuewarning if no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication. #endissuewarning

If there are restrictions on use, how will access be provided?

There are no restrictions beyond the IP screening described above, which is in line with European open data policies.

Is there a need for a data access committee?

There is no need for a data access committee.

Are there well described conditions for access (i.e. a machine-readable license)?

Yes, where possible, e.g. CC REL will be used for data not submitted to specialized repositories such as ENA.

How will the identity of the person accessing the data be ascertained?

Where data are shared only within the consortium, if the datasets are not yet finished or are undergoing IP checks, the data will be hosted internally and a username and password will be required for access (see GDPR rules). When the data are made public in EU or US repositories, completely anonymous access is normally allowed. This is the case for ENA as well and both are in line with GDPR requirements.

#if$_DATAPLANT Currently, data management relies on the annotated research context (ARC). It is password protected, so before any data or samples can be obtained, user authentication is required. #endif$_DATAPLANT

Making data interoperable

Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organizations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)?

Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata to interpret and analyze data in a biological context. By default, no proprietary formats will be used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In addition, text files might be edited in text processor files, but will be shared as pdf.

What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?

As noted above, we foresee using minimal standards such as #if$_RNASEQ|$_GENOMIC #if$_MINSEQE MinSEQe for sequencing data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites #if$_MIAPPE and MIAPPE for phenotyping-like data #endif$_MIAPPE. The minimal information standards will allow the integration of data across projects, and its reuse according to established and tested protocols. We will also use ontological terms to enrich the data sets relying on free and open ontologies where possible. Additional ontology terms might be created and canonized during the $_PROJECT.

Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?

Open ontologies will be used where they are mature. As stated above, some ontologies and controlled vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT

In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?

Common and open ontologies will be used, so this question does not apply.

Increase data reuse (by clarifying licences)

How will the data be licensed to permit the widest re-use possible?

Open licenses, such as Creative Commons (CC), will be used whenever possible.

When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.

#if$_early Some raw data is made public as soon as it is collected and processed.#endif$_early #if$_beforepublication Relevant processed datasets are made public when the research findings are published.#endif$_beforepublication #if$_endofproject At the end of the project, all data without embargo period will be published.#endif$_endofproject #if$_embargo Data, which is subject to an embargo period, is not publicly accessible until the end of embargo period.#endif$_embargo #if$_request Data is made available upon request, allowing controlled sharing while ensuring responsible use.#endif$_request #if$_ipissue IP issues will be checked before publication. #endif$_ipissue All consortium partners will be encouraged to make data available before publication, openly and/or under pre-publication agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International Data Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are complete.

Are the data produced and/or used in the project usable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why.

There will be no restrictions once the data are made public.

How long is it intended that the data remains re-usable?

The data will be made available for many years#if$_DATAPLANT and ideally indefinitely after the end of the project#endif$_DATAPLANT.

Data submitted to repositories (as detailed above) e.g. ENA /PRIDE would be subject to local data storage regulation.

Are data quality assurance processes described?

The data will be checked and curated. #if$_DATAPLANT Furthermore, data will be quality controlled (QC) using automatic procedures as well as manual curation #endif$_DATAPLANT.

2.3    Allocation of resources

What are the costs for making data FAIR in your project?

The $_PROJECT will bear the costs of data curation, #if$_DATAPLANT ARC consistency checks, #endif$_DATAPLANT and data maintenance/security before transfer to public repositories. Subsequent costs are then borne by the operators of these repositories.

Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) but not charged against the $_PROJECT or its members but by the operation budget of these repositories.

How will these be covered? Note that costs related to open access to research data are eligible as part of the Horizon 2020 or Horizon Europe grant (if compliant with the Grant Agreement conditions).

The cost born by the $_PROJECT are covered by the project funding. Pre-existing structures #if$_DATAPLANT such as structures, tools, and knowledge laid down in the DataPLANT consortium#endif$_DATAPLANT will also be used.

Who will be responsible for data management in your project?

The responsible person will be $_DATAOFFICER of the $_PROJECT.

Are the resources for long term preservation discussed (costs and potential value, who decides and how/what data will be kept and for how long)?

The data officer #if$_PARTNERS or $_PARTNERS #endif$_PARTNERS will ultimately decides on the strategy to preserve data that are not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT when the project ends. This will be in line with EU guidlines, institute policies, and data sharing based on EU and international standards.

2.4    Data security

What provisions are in place for data security (including data recovery as well as secure storage and transfer of sensitive data)?

Online platforms will be protected by vulnerability scanning, two-factor authorization and daily automatic backups allowing immediate recovery. All partners holding confidential project data to use secure platforms with automatic backups and offsite secure copies. #if$_DATAPLANT DataHUB and ARCs have been generated in DataPLANT, data security will be imposed. This comprises secure storage, and the use of password and usernames is generally transferred via separate safe media.#endif$_DATAPLANT

Is the data safely stored in certified repositories for long term preservation and curation?

Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies:

#if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

#if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

#if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

#if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

#if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

#if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

#if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP

2.5    Ethical aspects

Are there any ethical or legal issues that can have an impact on data sharing? These can also be discussed in the context of an ethics review. If relevant, include references to ethics deliverables and ethics chapter in the Description of the Action (DoA).

At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee to deal with data from plants, although we will diligently follow the Nagoya protocol on access and benefit sharing. #issuewarning you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya (🡺see Nagoya protocol). gets also part of sequence information. In any case if you use material not from your (partner) country and characterize this physically e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action unless this is from e.g. US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. #endissuewarning

Is informed consent for data sharing and long term preservation included in questionnaires dealing with personal data?

The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication activities using specific methods and procedures developed by the $_PROJECT partners to adhere to data protection. #issuewarning you need to inform and better get WRITTEN consent that you store emails and names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them #endissuewarning

2.6    Other issues

Do you make use of other national/funder/sectorial/departmental procedures for data management? If yes, which ones?

Yes, the $_PROJECT will use common Research Data Management (RDM) tools such as #if$_DATAPLANT|$_NFDI resources developed by the NFDI of Germany,#endif$_DATAPLANT|$_NFDI #if$_FRENCH infrastructure developed by INRAe from France, #endif$_FRENCH #if$_EOSC and cloud service developed by EOSC (European Open Science Cloud)#endif$_EOSC .

3     Annexes

3.1     Abbreviations

#if$_DATAPLANT

ARC Annotated Research Context

#endif$_DATAPLANT

CC Creative Commons

CC CEL Creative Commons Rights Expression Language

DDBJ DNA Data Bank of Japan

DMP Data Management Plan

DoA Description of Action

DOI Digital Object Identifier

EBI European Bioinformatics Institute

ENA European Nucleotide Archive

EU European Union

FAIR Findable Accessible Interoperable Reproducible

GDPR General data protection regulation (of the EU)

IP Intellectual Property

ISO International Organization for Standardization

MIAMET Minimal Information about Metabolite experiment

MIAPPE Minimal Information about Plant Phenotyping Experiment

MinSEQe Minimum Information about a high-throughput Sequencing Experiment

NCBI National Center for Biotechnology Information

NFDI National Research Data Infrastructure (of Germany)

NGS Next Generation Sequencing

RDM Research Data Management

RNASeq RNA Sequencing

SOP Standard Operating Procedures

SRA Short Read Archive

#if$_DATAPLANT

SWATE Swate Workflow Annotation Tool for Excel

#endif$_DATAPLANT

ONP Oxford Nanopore

qRTPCR quantitative real time polymerase chain reaction

WP Work Package




Data Management Plan of the Horizon Europe Project $_PROJECT

Action Number:

$_FUNDINGPROGRAMME

Action Acronym:

$_PROJECT

Action Title:

$_PROJECT

Creation Date:

$_CREATIONDATE

Modification Date:

$_MODIFICATIONDATE


Introduction

#if$_EU The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. #endif$_EU To best profit from open data, it is necessary not only to store data but to make data Findable, Accessible, Interoperable, and Reusable (FAIR).#if$_PROTECT We support open and FAIR data, however, we also consider the need to protect individual data sets. #endif$_PROTECT

The aim of this document is to provide guidelines on principles guiding the data management in the $_PROJECT and what data will be stored by using the responses to the EU questionnaire on Data Management Plan (DMP) as a DMP document.

The detailed DMP instructs how data will be handled during and after the project. The $_PROJECT DMP is modified according to the Horizon Europe and Horizon Europe online Manual. #if$_UPDATE It will be updated/its validity checked during the $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. #endif$_UPDATE

1.    Data Summary

Will you re-use any existing data and what will you re-use it for? State the reasons if re-use of any existing data has been considered but discarded.

The project builds on existing data sets and relies on them. #if$_RNASEQ For instance, without a proper genomic reference it is very difficult to analyze NGS data sets.#endif$_RNASEQ It is also important to include existing data sets on the expression and metabolic behaviour of $_STUDYOBJECT, but of course, also on existing characterization and the background knowledge. #if$_PARTNERS of the partners. #endif$_PARTNERS Genomic references can simply be gathered from reference databases for genomes/sequences, like the National Center for Biotechnology Information: NCBI (US); European Bioinformatics Institute: EBI (EU); DNA Data Bank of Japan: DDBJ (JP). Furthermore, prior 'unstructured' data in the form of publications and data contained therein will be used for decision making.

What types and formats of data will the project generate or re-use?

The $_PROJECT will collect and/or generate the following types of raw data : $_PHENOTYPIC, $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEoMIC, $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data will also be processed and modified using analytical pipelines, which may yield different results or include ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be tracked in the DataPLANT ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive these resources (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise in the DataPLANT consortium#endif$_DATAPLANT.

What is the purpose of the data generation or re-use and its relation to the objectives of the project?

The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analysis information. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section.

What is the expected size of the data that you intend to generate or re-use?

We expect to generate raw data in the range of $_RAWDATA GB of data. The size of the derived data will be about $_DERIVEDDATA GB.

What is the origin/provenance of the data, either generated or re-used?

Public data will be extracted as described in the previous paragraph. For the $_PROJECT, specific data sets will be generated by the consortium partners.

Data of different types or representing different domains will be generated using unique approaches. For example:

#if$_PREVIOUSPROJECTS

Data from previous projects such as $_PREVIOUSPROJECTS will be considered.

#endif$_PREVIOUSPROJECTS

To whom might it be useful ('data utility'), outside your project?

The data will initially benefit the $_PROJECT partners, but will also be made available to selected stakeholders closely involved in the project, and then the scientific community working on $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also use the data after publication. The data will be disseminated according to the $_PROJECT's dissemination and communication plan#if$_DATAPLANT, which aligns with DataPLANT platform or other means#endif$_DATAPLANT.

$_DATAUTILITY

2    FAIR data

2.1. Making data findable, including provisions for metadata

Will data be identified by a persistent identifier?

All data sets will receive unique identifiers, and they will be annotated with metadata.

Will rich metadata be provided to allow discovery? What metadata will be created? What disciplinary or general standards will be followed? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how.

All datasets will be associated with unique identifiers and will be annotated with metadata. We will use Investigation, Study, Assay (ISA) specification for metadata creation. The $_PROJECT will rely on community standards plus additional recommendations applicable in the plant science, such as the #if$_PHENOTYPIC #if$_MIAPPE MIAPPE (Minimum Information About a Plant Phenotyping Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen),#endif$_MIMARKSSPECIMEN #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey),#endif$_MIMARKSSURVEY #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG #endif$_GENOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI #endif$_IMAGE #if$_PROTEOMIC #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX #endif$_PROTEOMIC These specific standard unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, minimal cross-domain annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered to. #endif$_OTHERSTANDARDS

Will search keywords be provided in the metadata to optimize the possibility for discovery and then potential re-use?

Keywords about the experiment and the general consortium will be included, as well as an abstract about the data, where useful. In addition, certain keywords can be auto-generated from dense metadata and its underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized DataPLANT ontologies that are supplemented where the ontology does not yet include the variables. #endif$_DATAPLANT

Will metadata be offered in such a way that it can be harvested and indexed?

To maintain data integrity and to be able to re-analyze data, data sets will get version numbers where this is useful (e.g. raw data must not be changed and will not get a version number and is considered immutable). #if$_DATAPLANT This is automatically supported by the ARC Git DataPLANT infrastructure. #endif$_DATAPLANT Data variables will be allocated standard names. For example, genes, proteins and metabolites will be named according to approved nomenclature and conventions. These will also be linked to functional ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by humans. Plant names will include traditional names, binomials, and all strain/cultivar/subspecies/variety identifiers.

2.2.    Making data accessible

Repository

Will the data be deposited in a trusted repository?

Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies:

#if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

#if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

#if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

#if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

#if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

#if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

#if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP

Have you explored appropriate arrangements with the identified repository where your data will be deposited?

The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. #if$_DATAPLANT For DataPLANT, this has been agreed upon, as all the omics repositories of International Nucleotide Sequence Database Collaboration (INSDC) will be used. #endif$_DATAPLANT #issuewarning if no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication. #endissuewarning

Does the repository ensure that the data is assigned an identifier? Will the repository resolve the identifier to a digital object?

Data will be stored in the following repositories:

#if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

#if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

#if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

#if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

#if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

#if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

#if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP In the case of unstructured less standardized data (e.g. experimental phenotypic measurements), these will be metadata annotated and if complete given a digital object identifier (DOI).#if$_DATAPLANT and the whole data sets wrapped into an ARC will get DOIs as well. #endif$_DATAPLANT

Those repositories are the most appropriate ones.

Data:

Will all data be made openly available? If certain datasets cannot be shared (or need to be shared under restricted access conditions), explain why, clearly separating legal and contractual reasons from intentional restrictions. Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if opening their data goes against their legitimate interests or other constraints as per the Grant Agreement.

By default, all data sets from the $_PROJECT will be shared with the community and made openly available. However, before the data are released, all will be provided with an opportunity to check for potential IP (according to the consortium agreement and background IP rights). #if$_INDUSTRY This applies in particular to data pertaining to the industry. #endif$_INDUSTRY IP protection will be prioritized for datasets that offer the potential for exploitation.

Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out.

If an embargo is applied to give time to publish or seek protection of the intellectual property (e.g. patents), specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.

#if$_early Some raw data is made public as soon as it is collected and processed.#endif$_early #if$_beforepublication Relevant processed datasets are made public when the research findings are published.#endif$_beforepublication #if$_endofproject At the end of the project, all data without embargo period will be published.#endif$_endofproject #if$_embargo Data, which is subject to an embargo period, is not publicly accessible until the end of embargo period.#endif$_embargo #if$_request Data is made available upon request, allowing controlled sharing while ensuring responsible use.#endif$_request #if$_ipissue IP issues will be checked before publication. #endif$_ipissue All consortium partners will be encouraged to make data available before publication, openly and/or under pre-publication agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International Data Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are complete.

Will the data be accessible through a free and standardized access protocol?

#if$_DATAPLANT DataPLANT stores data in the ARC, which is a git repo. The DataHUB shares data and metadata as a gitlab instance. The "Git" and "Web" protocol are opensourced and freely accessible. In addition, #endif$_DATAPLANT Zenodo and the endpoint repositories will also be used for access. In General, web-based protocols are free and standardized for access.

If there are restrictions on use, how will access be provided to the data, both during and after the end of the project?

There are no restrictions, beyond the aforementioned IP checks, which are in line with e.g. European open data policies.

How will the identity of the person accessing the data be ascertained?

In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally and username and password will be required (see also our GDPR rules). In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. This is the case for ENA as well and both are in line with GDPR requirements. #if$_DATAPLANT Currently, data management relies on the annotated research context ARC. It is password protected, so before any data can be obtained or samples generated an authentication needs to take place. #endif$_DATAPLANT

Is there a need for a data access committee (e.g. to evaluate/approve access requests to personal/sensitive data)?

Consequently, there is no need for a committee.

Metadata:

Will metadata be made openly available and licenced under a public domain dedication CC0, as per the Grant Agreement? If not, please clarify why. Will metadata contain information to enable the user to access the data?

Yes, where possible, e.g. CC REL will be used for data not submitted to specialized repositories such as ENA.

How long will the data remain available and findable? Will metadata be guaranteed to remain available after data is no longer available?

The data will be made available for many years#if$_DATAPLANT and ideally indefinitely after the end of the project#endif$_DATAPLANT. In any case data submitted to repositories (as detailed above) e.g. ENA /PRIDE would be subject to local data storage regulation.

Will documentation or reference about any software be needed to access or read the data be included? Will it be possible to include the relevant software (e.g. in open source code)?

#if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. #endif$_PROPRIETARY #if!$_PROPRIETARY No specialized software will be needed to access the data, usually just a modern browser. Access will be possible through web interfaces. For data processing after obtaining raw data, typical open-source software can be used. #endif!$_PROPRIETARY #if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC commander, and the DMP tool which will not necessarily make the interaction with data more convenient. #endif$_DATAPLANT #if$_DATAPLANT However, DataPLANT resources are well described, and their setup is documented on their github project pages. #endif$_DATAPLANT As stated above, here we use publicly available open-source and well-documented certified software #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY

2.3. Making data interoperable

What data and metadata vocabularies, standards, formats or methodologies will you follow to make your data interoperable to allow data exchange and re-use within and across disciplines? Will you follow community-endorsed interoperability best practices? Which ones?

As noted above, we foresee using minimal standards such as the #if$_PHENOTYPIC #if$_MIAPPE MIAPPE (Minimum Information About a Plant Phenotyping Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen),#endif$_MIMARKSSPECIMEN #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey),#endif$_MIMARKSSURVEY #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG #endif$_GENOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI #endif$_IMAGE #if$_PROTEOMIC #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX #endif$_PROTEOMIC These specific standard unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, minimal cross-domain annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered to. #endif$_OTHERSTANDARDS

Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata to interpret and analyze data in a biological context. By default, no proprietary formats will be used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In addition, text files might be edited in text processor files, but will be shared as pdf. Open ontologies will be used where they are mature. As stated above, some ontologies and controlled vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT

In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies? Will you openly publish the generated ontologies or vocabularies to allow reusing, refining or extending them?

Common and open ontologies will be used. In fact, open biomedical ontologies will be used where they are mature. As stated in the previous question, sometimes ontologies and controlled vocabularies might have to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the DataPLANT biology ontology (DPBO) developed in DataPLANT. #endif$_DATAPLANT. Ontology databases such as OBO Foundry will be used to publish ontology. #if$_DATAPLANT The DPBO is also published in GitHub https://github.com/nfdi4plants/nfdi4plants_ontology #endif$_DATAPLANT.

Will your data include qualified references to other data (e.g. other data from your project, or datasets from previous research)?

The references to other data will be made in the form of DOI and ontology terms.

2.4. Increase data re-use

How will you provide documentation needed to validate data analysis and facilitate data re-use (e.g. readme files with information on methodology, codebooks, data cleaning, analyses, variable definitions, units of measurement, etc.)?

The documentation will be provided in the form of ISA (Investigation Study Assay) and CWL (Common Workflow Language). #if$_DATAPLANT Here, the $_PROJECT will build on the ARC container, which includes all the data, metadata, and documentations. #endif$_DATAPLANT

Will your data be made freely available in the public domain to permit the widest re-use possible? Will your data be licensed using standard reuse licenses, in line with the obligations set out in the Grant Agreement?

Yes, our data will be made freely available in the public domain to permit the widest re-use possible. Open licenses, such as Creative Commons (CC), will be used whenever possible.

Will the data produced in the project be useable by third parties, in particular after the end of the project?

There will be no restrictions once the data is made public.

Will the provenance of the data be thoroughly documented using the appropriate standards? Describe all relevant data quality assurance processes.

The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analysis information. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section.

Describe all relevant data quality assurance processes. Further to the FAIR principles, DMPs should also address research outputs other than data, and should carefully consider aspects related to the allocation of resources, data security and ethical aspects.

The data will be checked and curated by using data collection protocol, personnel training, data cleaning, data analysis, and quality control #if$_DATAPLANT Furthermore, data will be analyzed for quality control (QC) problems using automatic procedures as well as by manual curation #endif$_DATAPLANT. Document all data quality assurance processes, including the data collection protocol, data cleaning procedures, data analysis techniques, and quality control measures. This documentation should be kept for future reference and should be made available to stakeholders upon request.

3    Other research outputs

In addition to the management of data, beneficiaries should also consider and plan for the management of other research outputs that may be generated or re-used throughout their projects. Such outputs can be either digital (e.g. software, workflows, protocols, models, etc.) or physical (e.g. new materials, antibodies, reagents, samples, etc.).

In the current data management plan, any digital output including but not limited to software, workflows, protocols, models, documents, templates, notebooks are all treated as data. Therefore, all aforementioned digital objects are already described in detail. For the non-digital objects, the data management plan will be closely connected to the digitalisation of the physical objects. #if$_DATAPLANT $_PROJECT will build a workflow which connects the ARC with an electronic lab notebook in order to also manage the physical objects. #endif$_DATAPLANT

Beneficiaries should consider which of the questions pertaining to FAIR data above, can apply to the management of other research outputs, and should strive to provide sufficient detail on how their research outputs will be managed and shared, or made available for re-use, in line with the FAIR principles.

Open licenses, such as Creative Commons CC, will be used whenever possible even on the other digital objects.

4.    Allocation of resources

What will the costs be for making data or other research outputs FAIR in your project (e.g. direct and indirect costs related to storage, archiving, re-use, security, etc.)?

The $_PROJECT will bear the costs of data curation, #if$_DATAPLANT ARC consistency checks, #endif$_DATAPLANT and data maintenance/security before transfer to public repositories. Subsequent costs are then borne by the operators of these repositories.

Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) but not charged against the $_PROJECT or its members but by the operation budget of these repositories.

How will these be covered? Note that costs related to research data/output management are eligible as part of the Horizon Europe grant (if compliant with the Grant Agreement conditions)

The cost born by the $_PROJECT are covered by the project funding. Pre-existing structures #if$_DATAPLANT such as structures, tools, and knowledge laid down in the DataPLANT consortium#endif$_DATAPLANT will also be used.

Who will be responsible for data management in your project?

The responsible person will be $_DATAOFFICER of the $_PROJECT.

How will long term preservation be ensured? Discuss the necessary resources to accomplish this (costs and potential value, who decides and how, what data will be kept and for how long)?

The data officer #if$_PARTNERS or $_PARTNERS #endif$_PARTNERS will ultimately decides on the strategy to preserve data that are not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT when the project ends. This will be in line with EU guidlines, institute policies, and data sharing based on EU and international standards.

5.    Data security

What provisions are or will be in place for data security (including data recovery as well as secure storage/archiving and transfer of sensitive data)?

Online platforms will be protected by vulnerability scanning, two-factor authorization and daily automatic backups allowing immediate recovery. All partners holding confidential project data to use secure platforms with automatic backups and offsite secure copies. #if$_DATAPLANT DataHUB and ARCs have been generated in DataPLANT, data security will be imposed. This comprises secure storage, and the use of password and usernames is generally transferred via separate safe media.#endif$_DATAPLANT

Will the data be safely stored in trusted repositories for long term preservation and curation?

Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies:

#if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

#if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

#if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

#if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

#if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

#if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

#if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP

6.    Ethics

Are there, or could there be, any ethics or legal issues that can have an impact on data sharing? These can also be discussed in the context of the ethics review. If relevant, include references to ethics deliverables and ethics chapter in the Description of the Action (DoA).

At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee, however, diligence for plant resource benefit sharing is considered . #issuewarning you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya (🡺see Nagoya protocol) gets also part of sequence information. In any case if you use material not from your (partner) country and characterize this physically e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action unless this is from e.g. US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. #endissuewarning

Will informed consent for data sharing and long term preservation be included in questionnaires dealing with personal data?

The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication activities using specific methods and procedures developed by the $_PROJECT partners to adhere to data protection. #issuewarning you need to inform and better get WRITTEN consent that you store emails and names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them #endissuewarning

7.    Other issues

Do you, or will you, make use of other national/funder/sectorial/departmental procedures for data management? If yes, which ones (please list and briefly describe them)?

Yes, the $_PROJECT will use common Research Data Management (RDM) tools such as #if$_DATAPLANT|$_NFDI resources developed by the NFDI of Germany,#endif$_DATAPLANT|$_NFDI #if$_FRENCH infrastructure developed by INRAe from France, #endif$_FRENCH #if$_EOSC and cloud service developed by EOSC (European Open Science Cloud)#endif$_EOSC .

3     Annexes

3.1     Abbreviations

#if$_DATAPLANT

ARC Annotated Research Context

#endif$_DATAPLANT

CC Creative Commons

CC CEL Creative Commons Rights Expression Language

DDBJ DNA Data Bank of Japan

DMP Data Management Plan

DoA Description of Action

DOI Digital Object Identifier

EBI European Bioinformatics Institute

ENA European Nucleotide Archive

EU European Union

FAIR Findable Accessible Interoperable Reproducible

GDPR General data protection regulation (of the EU)

IP Intellectual Property

ISO International Organization for Standardization

MIAMET Minimal Information about Metabolite experiment

MIAPPE Minimal Information about Plant Phenotyping Experiment

MinSEQe Minimum Information about a high-throughput Sequencing Experiment

NCBI National Center for Biotechnology Information

NFDI National Research Data Infrastructure (of Germany)

NGS Next Generation Sequencing

RDM Research Data Management

RNASeq RNA Sequencing

SOP Standard Operating Procedures

SRA Short Read Archive

#if$_DATAPLANT

SWATE Swate Workflow Annotation Tool for Excel

#endif$_DATAPLANT

ONP Oxford Nanopore

qRTPCR quantitative real time polymerase chain reaction

WP Work Package


Data Management Plan of the DFG Project $_PROJECT

1.    Data description

1.1    Introduction

#if$_EU

The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. #endif$_EU To best profit from open data, it is necessary not only to store data but to make data Findable, Accessible, Interoperable and Reusable (FAIR). #if$_PROTECT Open and FAIR data, however, considers the need to protect individual data sets. #endif$_PROTECT

The aim of this document is to provide guidelines on principles guiding the data management in the $_PROJECT and what data will be stored by using the responses to the DFG Data Management Plan (DMP) checklist to generate a DMP document.

The detailed DMP instructs how data will be handled during and after the project. The $_PROJECT DMP is modified according to the DFG data management checklist. #if$_UPDATE It will be updated/its validity checked during the $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. #endif$_UPDATE

1.2    How does your project generate new data?

Data of different types or of different domains will be generated differently. For example:

The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analysis information. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section.

Public data will be extracted as described in paragraph 1.3. For the $_PROJECT, specific data sets will be generated by the consortium partners.

1.3    Is existing data reused?

The project builds on existing data sets and relies on them. #if$_RNASEQ For instance, without a proper genomic reference it is very difficult to analyze NGS data sets.#endif$_RNASEQ It is also important to include existing data sets on the expression and metabolic behaviour of $_STUDYOBJECT, but of course, also on existing characterization and the background knowledge. #if$_PARTNERS of the partners. #endif$_PARTNERS Genomic references can simply be gathered from reference databases for genomes/sequences, like the National Center for Biotechnology Information: NCBI (US); European Bioinformatics Institute: EBI (EU); DNA Data Bank of Japan: DDBJ (JP). Furthermore, prior 'unstructured' data in the form of publications and data contained therein will be used for decision making.

1.4    Which data types (in terms of data formats like image data, text data or measurement data) arise in your project and in what way are they further processed?

We foresee that the following data about $_STUDYOBJECT will be collected and generated at the very least: $_PHENOTYPIC, $_GENETIC, $_GENOMIC, $_METABOLOMIC, $_RNASEQ, $_IMAGE, $_PROTEOMIC, $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA and result data. Furthermore, data derived from the original raw data sets will also be collected. This is important, as different analytical pipelines might yield different results or include ad-hoc data analysis parts#if$_DATAPLANT and these pipelines will be tracked in the DataPLANT ARC#endif$_DATAPLANT. Therefore, specific care will be taken, to document and archive these resources (including the analytic pipelines) as well#if$_DATAPLANT relying on the vast expertise in the DataPLANT consortium #endif$_DATAPLANT.

1.5    To what extent do these arise or what is the anticipated data volume?

We expect to generate raw data in the range of $_RAWDATA GB of data. The size of the derived data will be about $_DERIVEDDATA GB.

2.    Documentation and data quality

2.1.    What approaches are being taken to describe the data in a comprehensible manner (such as the use of available metadata, documentation standards or ontologies)?

All datasets will be associated with unique identifiers and will be annotated with metadata. We will use Investigation, Study, Assay (ISA) specification for metadata creation. The $_PROJECT will rely on community standards plus additional recommendations applicable in the plant science, such as the #if$_PHENOTYPIC #if$_MIAPPE MIAPPE (Minimum Information About a Plant Phenotyping Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen),#endif$_MIMARKSSPECIMEN #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey),#endif$_MIMARKSSURVEY #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG #endif$_GENOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI #endif$_IMAGE #if$_PROTEOMIC #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX #endif$_PROTEOMIC These specific standard unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, minimal cross-domain annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered to. #endif$_OTHERSTANDARDS

Open ontologies will be used where they are mature. As stated above, some ontologies and controlled vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT Keywords about the experiment and the general consortium will be included, as well as an abstract about the data, where useful. In addition, certain keywords can be auto-generated from dense metadata and its underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized DataPLANT ontologies that are supplemented where the ontology does not yet include the variables. #endif$_DATAPLANT

In fact, open biomedical ontologies will be used where they are mature. As stated in the previous question, sometimes ontologies and controlled vocabularies might have to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT

2.2    What measures are being adopted to ensure high data quality?

The $_PROJECT aims at the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analysis information. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards. Data variables will be allocated standard names. For example, genes, proteins and metabolites will be named according to approved nomenclature and conventions. These will also be linked to functional ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by humans. Plant names will include traditional names, binomials, and all strain/cultivar/subspecies/variety identifiers.

To maintain data integrity and to be able to re-analyze data, data sets will get version numbers where this is useful (e.g. raw data must not be changed and will not get a version number and is considered immutable). #if$_DATAPLANT this is automatically supported by the ARC Git DataPLANT infrastructure. #endif$_DATAPLANT

As mentioned above, we foresee using e.g. #if$_RNASEQ|$_GENOMIC #if$_MINSEQE MinSEQe for sequencing data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites#if$_MIAPPE as well as MIAPPE for phenotyping-like data#endif$_MIAPPE. The latter will thus allow the integration of data across projects and safeguards that reuse established and tested protocols. Additionally, we will use ontology terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the $_PROJECT.

2.3    Are quality controls in place and if so, how do they operate?

The data will be checked and curated through the project period. #if$_DATAPLANT Furthermore, data will be analyzed for quality control (QC) problems using automatic procedures as well as by manual curation. #endif$_DATAPLANT Phd students and lab professionals will be responsible for the first-hand quality control. Afterwards, the data will be checked and annotated by $_DATAOFFICER. #if$_RNASEQ|$_GENOMIC FastQC will be conducted on the base-calling. #endif$_RNASEQ|$_GENOMIC Before publication, the data will be controlled again.

2.4    Which digital methods and tools (e.g. software) are required to use the data?

The $_PROJECT will use common Research Data Management (RDM) tools such as #if$_DATAPLANT|$_NFDI resources developed by the NFDI of Germany,#endif$_DATAPLANT|$_NFDI #if$_FRENCH infrastructure developed by INRAe from France, #endif$_FRENCH #if$_EOSC and cloud service developed by EOSC (European Open Science Cloud)#endif$_EOSC .

#if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. #endif$_PROPRIETARY

#if!$_PROPRIETARY No specialized software will be needed to access the data, usually just a modern browser. Access will be possible through web interfaces. For data processing after obtaining raw data, typical open-source software can be used. As no proprietary software is needed, no documentation needs to be provided. #endif!$_PROPRIETARY

#if$_DATAPLANT However, DataPLANT resources are well described, and their setup is documented on their github project pages. #endif$_DATAPLANT

#if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC commander, and the DMP tool which will not necessarily make the interaction with data more convenient. #endif$_DATAPLANT

As stated above, here we use publicly available open-source and well-documented certified software #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY.

3.    Storage and technical archiving the project

3.1    How is the data to be stored and archived throughout the project duration?

Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies: #if$_GENETIC #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_GENETIC #if$_TRANSCRIPTOMIC|$_GENETIC #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #endif$_TRANSCRIPTOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE #if$_METABOLOMIC #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC #if$_PROTEOMIC #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC #if$_PHENOTYPIC #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP

Data will be made available for many years#if$_DATAPLANT and potentially indefinitely after the end of the project#endif$_DATAPLANT.

In any case data submitted to international discipline related repositories which use specialized technologies (as detailed above) e.g. ENA /Pride would be subject to local data storage regulation.

3.2    What is in place to secure sensitive data throughout the project duration (access and usage rights)?

#if$_DATAPLANT In DataPLANT, data management relies on the Annotated Research Context (ARC). It is password protected, so before any data can be obtained or samples generated, an authentication needs to take place. #endif$_DATAPLANT

In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally, and the username and the password will be required (see also our GDPR rules). In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. this is the case for ENA as well and both are in line with GDPR requirements.

There will be no restrictions once the data is made public.

4.    Legal obligations and conditions

4.1    What are the legal specifics associated with the handling of research data in your project?

At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee, however, diligence for plant resource benefit sharing is considered. #issuewarning you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya (🡺see Nagoya protocol) gets also part of sequence information. In any case if you use material not from your (partner) country and characterize this physically e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action unless this is from e.g. US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. #endissuewarning

The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication activities using specific methods and procedures developed by the $_PROJECT partners to adhere to data protection. #issuewarning you need to inform and better get WRITTEN consent that you store emails and names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them #endissuewarning

4.2    Do you anticipate any implications or restrictions regarding subsequent publication or accessibility?

Once data is transferred to the $_PROJECT platform#if$_DATAPLANT and ARCs have been generated in DataPLANT#endif$_DATAPLANT, data security will be imposed. This comprises secure storage, and the use of passwords and usernames is generally transferred via separate safe media.

4.3    What is in place to consider aspects of use and copyright law as well as ownership issues?

Open licenses, such as Creative Commons (CC), will be used whenever possible.

4.4    Are there any significant research codes or professional standards to be taken into account?

Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata to interpret and analyze data in a biological context. By default, no proprietary formats will be used; however, Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components in form#endif$_DATAPLANT. In addition, text files might be edited in text processor files, but will be shared as pdf.

5.    Data exchange and long-term data accessibility

5.1    Which data sets are especially suitable for use in other contexts?

The data will be useful for the $_PROJECT partners, the scientific community working on $_STUDYOBJECT or the general public interested in $_STUDYOBJECT. Hence, the $_PROJECT also strives to collect the data that has been disseminated and potentially advertise it#if$_DATAPLANT e.g. through the DataPLANT platform or other means #endif$_DATAPLANT, if it is not included in a publication anyway, which is the most likely form of dissemination.

5.2    Which criteria are used to select research data to make it available for subsequent use by others?

By default, all data sets from the $_PROJECT will be shared with the community and made openly available. This is, however, after partners have had the ability to check for IP protection (according to agreements and background rights). #if$_INDUSTRY This applies in particular to data pertaining to the industry. #endif$_INDUSTRY However, all partners also strive for IP protection of data sets which will be tested and due diligence will be given.

Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out.

5.3    Are you planning to archive your data in a suitable infrastructure?

#if$_DATAPLANT As the $_PROJECT is closely aligned with DataPLANT, the ARC converter and DataHUB will be used to find the end-point repositories and upload to the repositories automatically. #endif$_DATAPLANT

Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies:

#if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

#if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

#if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

#if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

#if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

#if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

#if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP

The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. #if$_DATAPLANT For DataPLANT, this has been agreed upon. #endif$_DATAPLANT #issuewarning if no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication. #endissuewarning

5.4    If so, how and where? Are there any retention periods?

There are no restrictions, beyond the aforementioned IP checks, which are in line with e.g. European open data policies.

The $_PARTNERS decides on preservation of data not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT#endif$_DATAPLANT after project end. This will be in line with EU institute policies and data sharing based on EU and international standards.

5.5    When is the research data available for use by third parties?

#if$_early Some raw data is made public as soon as it is collected and processed.#endif$_early #if$_beforepublication Relevant processed datasets are made public when the research findings are published.#endif$_beforepublication #if$_endofproject At the end of the project, all data without embargo period will be published.#endif$_endofproject #if$_embargo Data, which is subject to an embargo period, is not publicly accessible until the end of embargo period.#endif$_embargo #if$_request Data is made available upon request, allowing controlled sharing while ensuring responsible use.#endif$_request #if$_ipissue IP issues will be checked before publication. #endif$_ipissue All consortium partners will be encouraged to make data available before publication, openly and/or under pre-publication agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International Data Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are complete.

6.    Responsibilities and resources

6.1    Who is responsible for adequate handling of the research data (description of roles and responsibilities within the project)?

The responsible will be $_DATAOFFICER as data Officer. The data responsible(s) (data officer#if$_PARTNERS or $_PARTNERS #endif$_PARTNERS) decides on the preservation of data not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT after the project end. This will be in line with EU institute policies, and data sharing based on EU and international standards.

6.2    Which resources (costs; time or other) are required to implement adequate handling of research data within the project?

The costs comprise data curation, #if$_DATAPLANT ARC consistency checks, #endif$_DATAPLANT and maintenance on the $_PROJECT´s side.

Additionally, last-level costs for storage are incurred by end-point repositories (e.g. ENA) but not charged against the $_PROJECT or its members but by the operation budget of these repositories.

A large part of the cost is covered by the $_PROJECT #if$_DATAPLANT and the structures, tools and knowledge laid down in the DataPLANT consortium. #endif$_DATAPLANT

6.3    Who is responsible for curating the data once the project has ended?

As applicable, $_DATAOFFICER, who is responsible for ongoing data maintenance will also take care of it after the finish of the $_PROJECT. #if$_DATAPLANT DataPLANT as external data archives may provide such services in some cases. #endif$_DATAPLANT

7     Annexes

7.1     Abbreviations

#if$_DATAPLANT

ARC Annotated Research Context

#endif$_DATAPLANT

CC Creative Commons

CC CEL Creative Commons Rights Expression Language

DDBJ DNA Data Bank of Japan

DMP Data Management Plan

DoA Description of Action

DOI Digital Object Identifier

EBI European Bioinformatics Institute

ENA European Nucleotide Archive

EU European Union

FAIR Findable Accessible Interoperable Reproducible

GDPR General data protection regulation (of the EU)

IP Intellectual Property

ISO International Organization for Standardization

MIAMET Minimal Information about Metabolite experiment

MIAPPE Minimal Information about Plant Phenotyping Experiment

MinSEQe Minimum Information about a high-throughput Sequencing Experiment

NCBI National Center for Biotechnology Information

NFDI National Research Data Infrastructure (of Germany)

NGS Next Generation Sequencing

RDM Research Data Management

RNASeq RNA Sequencing

SOP Standard Operating Procedures

SRA Short Read Archive

#if$_DATAPLANT

SWATE Swate Workflow Annotation Tool for Excel

#endif$_DATAPLANT

ONP Oxford Nanopore

qRTPCR quantitative real time polymerase chain reaction

WP Work Package

Practical Data Management Guide of the $_PROJECT

This practical guide of data management in the $_PROJECT should be considered as a minimum description, leaving flexibility to include additional actions of specific domain or to national or local legislation.#if$_EU The $_PROJECT will follow EU FAIR principles.  #endif$_EU 


The practical guide of data management in the $_PROJECT aims at providing a complete walkthrough for the researcher. The contents are customized based on the user input in the Data Management Plant Generator (DMPG). The practices in this guide are customized to fit related legal, ethical, standardization and funding body requirements. The suitable practices will cover all steps of a data management life-cycle:


  1. Data acquisition:

    1. Data generation

Data should be generated by devices that are compatible with the open-source format. The $_STUDYOBJECT should be compliant to biodiversity protocols. The protocols used to collect $_PHENOTYPIC, $_GENETIC, $_GENOMIC, $_METABOLOMIC, $_RNASEQ data about $_STUDYOBJECT will be stored#if$_DATAPLANT in the assays folder of ARC repositories.#endif$_DATAPLANT#if!$_DATAPLANT in a FAIR data storage. #endif!$_DATAPLANT 

  1. Data collection

The data collection process is conducted by experimental scientists and stewarded by $_DATAOFFICER.#if$_DATAPLANT An electronic lab notebook will be used to ensure enough metadata is recorded and guarantees that the data can be further reused.#endif$_DATAPLANT 

  1. Data Organization

The data organization process is conducted by $_DATAOFFICER. The detailed organization method and procedure are reported to the PIs. #if$_DATAPLANT The data organization will profit from the knowledge-base and data-base of DataPLANT, elastic search will be used to find better ways to organize the data. #endif$_DATAPLANT 



  1. Annotation

    1. Workflow documentation

Because the data collection process is conducted by experimental scientists and stewarded by $_DATAOFFICER.#if$_DATAPLANT An electronic lab notebook was used to ensure enough metadata is recorded and guarantees that the data can be further reused. The workflow can be retrieved from the electronic workbook by using the toolkits provided from the DataPLANT such as SWATE and arccommander. #endif$_DATAPLANT 

  1. Metadata completion

In case some of the metadata is still missing from the documentation from the experimental scientists and data officer. #if$_DATAPLANT Raw data identifier and parsers provided by DataPLANT will be used to get meta data directly from the raw data file. The metadata collected from the raw data file can also be used to validate the metadata previously collected in case there are any mistakes. #endif$_DATAPLANT We foresee using #if$_RNASEQ|$_GENOMIC e.g.#if$_MINSEQE MinSEQe for sequencing data and#endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites as well as MIAPPE for phenotyping like data. The latter will thus allow the integration of data across projects and safeguards that reuse established and tested protocols. Additionally, we will use ontology terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the $_PROJECT.


  1. Maintenance: 

  1. Data storage

Raw data collected in previous steps are stored immediately by using#if$_DATAPLANT the infrastructure of DataPLANT #endif$_DATAPLANT #if!$_DATAPLANT in a secure infrastructure. ARC (Annotated Research Context) is used as a container to store the raw data as well as metadata and workflow.#endif!$_DATAPLANT

  1. Data curation

#if$_DATAPLANT Data stored in ARC is curated regularly as long as there are needs for update or revision.#endif$_DATAPLANT #if!$_DATAPLANT Data is curated regularly as long as there are needs for update or revision.#endif!$_DATAPLANT



  1. Publication and sharing

    1. Data publishing

Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in international discipline related repositories which use specialized technologies: #if$_GENETIC #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_GENETIC #if$_TRANSCRIPTOMIC|$_GENETIC #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #endif$_TRANSCRIPTOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE #if$_METABOLOMIC #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC #if$_PROTEOMIC #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC #if$_PHENOTYPIC #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP

  1. Data sharing

In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally, and the username and the password will be required (see also our GDPR rules). In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. This is the case for ENA as well and both are in line with GDPR requirements.

Metadata focus timeline



Stages

Actions

Study

initialization

The metadata of study is created at the beginning of the project and updated continuously afterwards#if$_DATAPLANT, the input of the DMP generator created during the proposal stage can be reused. #endif$_DATAPLANT 

Sample

Collection

The information used to identify exact samples are initiated before experiments and updated at assay creation stages.

#if$_DATAPLANT The sample SWATE template will be used to document the sample metadata. A part of sample metadata which can be retrieved from the raw data will be updated afterwards using the ARC parsers #endif$_DATAPLANT 

Assay

Creation

Assay metadata must be collected as a daily routine during the experimental phrase. #if$_DATAPLANT A electronic lab notebooks will be used to guarantee the applicability and correctness of the notebook content#endif$_DATAPLANT 

Computational Analysis


Workflow annotation will be conducted during the computational analysis phrase. #if$_DATAPLANT The workflow metadata will be stored in the assay folder of the ARC.#endif$_DATAPLANT 

Results Sharing

The metadata of results are collected after all modifications and should not be changed after publication. #if$_DATAPLANT Collection of result metadata before publication and the conversion from ARC to the repositories will be taken care of by the ARC2REPO converter and done with minimal efforts. #endif$_DATAPLANT 


Preferred formats for raw data

#if$_GENOMIC  

extension_ident

Format Name

.h5

Hierarchical Data Format

.bam

compressed binary version of a SAM file

.cram

compressed columnar file format for storing biological sequences aligned to a reference sequence

.fa

fasta

.faa

fasta

.fas

fasta

.fasta

fasta

.fastq

fastq

.ffn

fasta

.fna

fasta

.fq

fastq

.frn

fasta

.sff

sff-trim

#endif$_GENOMIC


#if$_RNASEQ  

.bam

compressed binary version of a SAM file

.cram

compressed columnar file format for storing biological sequences aligned to a reference sequence

.fa

fasta

.faa

fasta

.fas

fasta

.fast5

HDF5

.fasta

fasta

.fastq

fastq

.ffn

fasta

.fna

fasta

.fq

fastq

.frn

fasta

.sff

sff-trim

bas.h5

HDF5

.h5

Hierarchical Data Format

#endif$_RNASEQ


 #if$_METABOLOMIC  

.cdf

netCDF (AIA/ANDI) interchange data format

.cmp

netCDF compare file

.abf

Axon Binary File

.d

Agilent

.dat

Chromtech, Finnigan, VG

.idb

MASSLAB binary file

.jpf

Mass Center Main Mass Spectrometry Data (JEOL USA, Inc.)

.lcd

Shimadzu LC Solution / Labsolutions Data File

.mgf

Mascot Generic File

.raw

Thermo Xcalibur, Micromass (Waters), PerkinElmer, Waters

.scan

a spectrum or a Total Ion Chromatogram (TIC)

.wiff

ABI/Sciex

.xps

Thermo Fisher Scientific K-Alpha+ spectrometer file

cdf.cmp

netCDF compare file

#endif$_METABOLOMIC


 #if$_PROTEOMIC  

.baf

Bruker

.d

Agilent

.dat

Chromtech, Finnigan, VG

.fid

Bruker

.ita

ION-TOF

.itm

ION-TOF

.mgf 

Mascot Generic File

.ms

Finnigan (Thermo)

.ms2

Sequest MS/MS peak list

.pkl

Micromass peak list

.qgd

Shimadzu

.qgd

Shimadzu

.raw

Thermo Xcalibur, Micromass (Waters), PerkinElmer, Waters

.raw

Physical Electronics/ULVAC-PHI

.sms

Bruker/Varian

.spc

Shimadzu

.splib 

spectral library file

.t2d

ABI/Sciex

.tdc

Physical Electronics/ULVAC-PHI

.wiff

ABI/Sciex

.xms

Bruker/Varian

.yep

Bruker

.dta

Sequest MS/MS peak list

.msp


.nist


#endif$_PROTEOMIC



Datenmanagementplan (Beta test)

Projektname: $_PROJECT

Forschungsförderer: Bundesministerium für Bildung und Forschung

Förderprogramm: $_FUNDINGPROGRAMME

FKZ: $_DMPVERSION

Projektkoordinator: $_USERNAME

Kontaktperson Datenmanagement: $_DATAOFFICER

Kontakt: $_EMAIL

Projektbeschreibung:

Das $_PROJECT hat folgendes Ziel: $_PROJECTAIM. Daher sind Datenerhebung#if!$_VVISUALIZATION und Integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, Integration und Visualisierung #endif$_VVISUALIZATION#if$_DATAPLANT unter Verwendung der DataPLANT ARC-Struktur absolut notwendig,#endif$_DATAPLANT#if!$_DATAPLANT durch einen standardisierten Datenmanagementprozess absolut notwendig,#endif!$_DATAPLANT da die Daten nicht nur zum Verständnis von Prinzipien verwendet werden, sondern auch über die Herkunft der analysierten Daten informiert werden muss. Stakeholder müssen ebenfalls über die Herkunft der Daten informiert werden. Es ist daher notwendig sicherzustellen, dass die Daten gut generiert und auch gut mit Metadaten unter Verwendung offener Standards annotiert werden, wie im nächsten Abschnitt dargelegt.

Das $_PROJECT wird die folgenden Arten von Rohdaten sammeln und/oder generieren: $_PHENOTYPIC, $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEOMIC, $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA Daten, die sich auf $_STUDYOBJECT beziehen. Zusätzlich werden die Rohdaten auch durch analytische Pipelines verarbeitet und modifiziert, was zu unterschiedlichen Ergebnissen führen kann oder ad-hoc-Datenanalyse-Teile umfassen kann. #if$_DATAPLANT Diese Pipelines werden im DataPLANT ARC verfolgt.#endif$_DATAPLANT Daher wird darauf geachtet, diese Ressourcen (einschließlich der analytischen Pipelines) zu dokumentieren und zu archivieren#if$_DATAPLANT unter Rückgriff auf die Expertise im DataPLANT-Konsortium#endif$_DATAPLANT.

Erstellungsdatum: $_CREATIONDATE

Änderungsdatum: $_MODIFICATIONDATE

Zu beachtende Vorgaben:

#if$_EU Das $_PROJECT ist Teil der Open Data Initiative (ODI) der EU. #endif$_EU Um optimal von offenen Daten zu profitieren, ist es notwendig, die Daten nicht nur zu speichern, sondern sie auch auffindbar, zugänglich, interoperabel und wiederverwendbar (FAIR) zu machen. #if$_PROTECT Wir unterstützen offene und FAIR-Daten, berücksichtigen jedoch auch die Notwendigkeit, einzelne Datensätze zu schützen. #endif$_PROTECT

#if$_DATAPLANT Durch die Implementierung von DataPLANT können Forscher sicherstellen, dass alle relevanten Richtlinien und Anforderungen im Zusammenhang mit dem Datenmanagement eingehalten werden, was zu einer höheren Qualität und Zuverlässigkeit der Forschungsdaten führt. #endif$_DATAPLANT

Datenerhebung

Öffentliche Daten werden wie im vorherigen Absatz beschrieben extrahiert. Für das $_PROJECT werden spezifische Datensätze von den Konsortialpartnern generiert.

Daten unterschiedlicher Typen oder aus verschiedenen Bereichen werden mit einzigartigen Ansätzen generiert. Zum Beispiel:

#if$_PREVIOUSPROJECTS

Daten aus früheren Projekten wie $_PREVIOUSPROJECTS werden berücksichtigt.

#endif$_PREVIOUSPROJECTS

Wir erwarten die Erzeugung von $_RAWDATA GB Rohdaten und bis zu $_DERIVEDDATA GB verarbeiteten Daten.

Datenspeicherung:

#if$_DATAPLANT In DataPLANT, die Datenspeicherung basiert auf dem Annotated Research Context (ARC). Dieser ist passwortgeschützt, daher muss vor dem Erhalt von Daten oder der Generierung von Proben eine Authentifizierung erfolgen. #endif$_DATAPLANT

Online-Plattformen werden durch Schwachstellen-Scans, Zwei-Faktor-Authentifizierung und tägliche automatische Backups geschützt, die eine sofortige Wiederherstellung ermöglichen. Alle Partner, die vertrauliche Projektdaten halten, nutzen sichere Plattformen mit automatischen Backups und sicheren externen Kopien. #if$_DATAPLANT DataHUB und ARCs wurden in DataPLANT generiert, Datensicherheit wird durchgesetzt. Dies umfasst sichere Speicherung, und die Verwendung von Passwörtern und Benutzernamen wird generell über separate sichere Medien übertragen. #endif$_DATAPLANT

Das $_PROJECT trägt die Kosten für die Datenkuratierung, #if$_DATAPLANT ARC-Konsistenzprüfungen, #endif$_DATAPLANT und die Datenwartung/-sicherheit vor der Übertragung an öffentliche Repositorien. Nachfolgende Kosten werden dann von den Betreibern dieser Repositorien getragen.

Zusätzlich werden Kosten für die Speicherung nach der Veröffentlichung von den Endpunkt-Repositorien (z.B. ENA) getragen, jedoch nicht vom $_PROJECT oder seinen Mitgliedern, sondern durch das Betriebsbudget dieser Repositorien.

Es wird sichergestellt, dass Daten, die in internationalen, disziplinspezifischen Repositories gespeichert werden können, die spezialisierte Technologien nutzen:

#if$_GENETIC Für genetische Daten: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

#if$_TRANSCRIPTOMIC Für Transkriptomdaten: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

#if$_IMAGE Für Bilddaten: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

#if$_METABOLOMIC Für Metabolomdaten: #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

#if$_PROTEOMIC Für Proteomikdaten: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

#if$_PHENOTYPIC Für phänotypische Daten: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

#if$_OTHEREP und $_OTHEREP werden auch verwendet, um Daten zu speichern und die Daten werden dort ebenfalls verarbeitet.#endif$_OTHEREP

Die Dateibenennung erfolgt nach folgendem Standard:

Datenvariablen werden mit Standardnamen versehen. Zum Beispiel werden Gene, Proteine und Metaboliten gemäß anerkannter Nomenklatur und Konventionen benannt. Diese werden nach Möglichkeit auch mit funktionalen Ontologien verknüpft. Datensätze werden ebenfalls sinnvoll benannt, um die Lesbarkeit durch Menschen zu gewährleisten. Pflanzennamen umfassen traditionelle Namen, Binomialnamen und alle Stamm-/Kultivar-/Unterart-/Sortenbezeichner.

Datendokumentation

Wir verwenden die Investigation, Study, Assay (ISA) Spezifikation zur Metadaten-Erstellung. #if$_RNASEQ|$_GENOMIC Für spezifische Daten (z.B. RNASeq oder genomische Daten) verwenden wir Metadatentemplates der Endpunkt-Repositorien. #if$_MINSEQE The Minimum Information About a Next-generation Sequencing Experiment (MinSEQe) wird ebenfalls verwendet. #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Die folgenden Metadaten-/Mindestinformationsstandards werden zur Sammlung von Metadaten verwendet: #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen),#endif$_MIMARKSSPECIMEN #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey),#endif$_MIMARKSSURVEY #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG #endif$_GENOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI #endif$_IMAGE #if$_PROTEOMIC #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX #endif$_PROTEOMIC #if$_METABOLOMIC #if$_METABOLIGHTS Metabolights-Einreichungskonforme Standards werden für metabolomische Daten verwendet, wo dies von den Konsortialpartnern akzeptiert wird.#issuewarning Einige Metabolomik-Partner betrachten Metabolights nicht als akzeptierten Standard.#endissuewarning #endif$_METABOLIGHTS #endif$_METABOLOMIC Als Teil der Pflanzenforschungsgemeinschaft verwenden wir #if$_MIAPPE MIAPPE für Phänotypisierungsdaten im weitesten Sinne, werden aber auch auf #endif$_MIAPPE spezifische SOPs für zusätzliche Annotationen #if$_DATAPLANT zurückgreifen, die fortgeschrittene DataPLANT-Annotationen und Ontologien berücksichtigen. #endif$_DATAPLANT

In dem Fall, dass einige Metadaten noch fehlen, werden diese von den experimentellen Wissenschaftlern und dem Datenbeauftragten dokumentiert. #if$_DATAPLANT Rohdaten-Identifier und Parser, die von DataPLANT bereitgestellt werden, um Metadaten direkt aus der Rohdatei zu extrahieren. Die aus der Rohdatei gesammelten Metadaten können auch verwendet werden, um die zuvor gesammelten Metadaten zu validieren, falls Fehler auftreten. #endif$_DATAPLANT Wir sehen vor, #if$_RNASEQ|$_GENOMIC z.B.#if$_MINSEQE MinSEQe für Sequenzierungsdaten zu verwenden und#endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights-kompatible Formulare für Metaboliten sowie MIAPPE für phänotypische Daten. Letzteres ermöglicht die Integration von Daten über Projekte hinweg und stellt sicher, dass etablierte und getestete Protokolle wiederverwendet werden. Darüber hinaus werden wir Ontologiebegriffe verwenden, um die Datensätze mit freien und offenen Ontologien anzureichern. Zusätzlich könnten zusätzliche Ontologiebegriffe erstellt und während des $_PROJECT kanonisiert werden.

Legitimität

Im Moment erwarten wir keine ethischen oder rechtlichen Probleme beim Datenaustausch. In Bezug auf Ethik, da es sich um Pflanzendaten handelt, ist kein Ethikkomitee erforderlich, jedoch wird Sorgfalt bei der Aufteilung der Vorteile von Pflanzenressourcen berücksichtigt. #issuewarning Sie müssen hier überprüfen und jegliche Sorgfaltspflicht hier eintragen. Im Moment warten wir, ob Nagoya (🡺siehe Nagoya-Protokoll) auch Teil der Sequenzinformationen wird. In jedem Fall, wenn Sie Material verwenden, das nicht aus Ihrem (Partner-)Land stammt und dieses physikalisch charakterisieren, z.B. Metaboliten, Proteom, biochemisch RNASeq usw., könnte dies eine Nagoya-relevante Aktion darstellen, es sei denn, es stammt z.B. aus den USA (kein Partner), Irland (nicht unterzeichnet, trotzdem kontaktieren) usw., aber andere Gesetze könnten gelten…. #endissuewarning

Die einzigen personenbezogenen Daten, die möglicherweise gespeichert werden, sind der Name und die Zugehörigkeit des Einreichers in den Metadaten der Daten. Darüber hinaus werden personenbezogene Daten für Verbreitungs- und Kommunikationsaktivitäten gesammelt, wobei spezifische Methoden und Verfahren verwendet werden, die von den $_PROJECT-Partnern entwickelt wurden, um den Datenschutz einzuhalten. #issuewarning Sie müssen informieren und besser eine SCHRIFTLICHE Zustimmung einholen, dass Sie E-Mails und Namen oder sogar Pseudonyme wie Twitter-Handles speichern, wir entschuldigen uns sehr für diese Probleme, die wir nicht erfunden haben. #endissuewarning

Data Sharing

Falls Daten nur innerhalb des Konsortiums geteilt werden, wenn die Daten noch nicht fertig sind oder sich in der IP-Prüfung befinden, werden die Daten intern gehostet und der Benutzername und das Passwort werden benötigt (siehe auch unsere GDPR-Regeln). Wenn Daten unter finalen EU- oder US-Repositorys öffentlich gemacht werden, ist normalerweise ein vollständig anonymer Zugang erlaubt. Dies ist auch bei ENA der Fall und beide entsprechen den GDPR-Anforderungen.

Es wird keine Einschränkungen geben, sobald die Daten öffentlich gemacht werden. #if$_early Einige Rohdaten werden sofort nach ihrer Erfassung und Verarbeitung öffentlich gemacht.#endif$_early #if$_beforepublication Relevante verarbeitete Datensätze werden öffentlich gemacht, wenn die Forschungsergebnisse veröffentlicht werden.#endif$_beforepublication #if$_endofproject Am Ende des Projekts werden alle Daten ohne Sperrfrist veröffentlicht.#endif$_endofproject #if$_embargo Daten, die einer Sperrfrist unterliegen, sind bis zum Ende der Sperrfrist nicht öffentlich zugänglich.#endif$_embargo #if$_request Daten werden auf Anfrage verfügbar gemacht, was eine kontrollierte Weitergabe ermöglicht und gleichzeitig eine verantwortungsvolle Nutzung sicherstellt.#endif$_request #if$_ipissue IP-Probleme werden vor der Veröffentlichung überprüft. #endif$_ipissue Alle Konsortialpartner werden ermutigt, Daten vor der Veröffentlichung zugänglich zu machen, offen und/oder unter Vorveröffentlichungsvereinbarungen #if$_GENOMIC wie die in Fort Lauderdale gestarteten und durch den Toronto International Data Release Workshop festgelegten Vereinbarungen. #endif$_GENOMIC Dies wird umgesetzt, sobald die IP-bezogenen Überprüfungen abgeschlossen sind.

Die Daten werden zunächst den $_PROJECT Partnern zugutekommen, aber auch ausgewählten Stakeholdern, die eng in das Projekt eingebunden sind, und dann der wissenschaftlichen Gemeinschaft, die an $_STUDYOBJECT arbeitet. $_DATAUTILITY Darüber hinaus können auch die allgemeine Öffentlichkeit, die an $_STUDYOBJECT interessiert ist, die Daten nach der Veröffentlichung nutzen. Die Daten werden gemäß dem Verbreitungs- und Kommunikationsplan des $_PROJECT verbreitet, #if$_DATAPLANT der sich mit der DataPLANT-Plattform oder anderen Mitteln abstimmt #endif$_DATAPLANT.

Datenerhalt

Wir erwarten, dass wir Rohdaten im Bereich von $_RAWDATA GB an Daten generieren. Die Größe der abgeleiteten Daten wird etwa $_DERIVEDDATA GB betragen.

#if$_DATAPLANT Da das $_PROJECT eng mit DataPLANT abgestimmt ist, werden der ARC-Konverter und DataHUB verwendet, um die Endpunkt-Repositories zu finden und die Daten automatisch in die Repositories hochzuladen. #endif$_DATAPLANT

Die Daten werden über die $_PROJECT-Plattform mit einer benutzerfreundlichen Oberfläche verfügbar gemacht, die eine Datenvisualisierung ermöglicht. Die Endpunkt-Repositories sind: #if$_GENETIC #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_GENETIC #if$_TRANSCRIPTOMIC|$_GENETIC #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #endif$_TRANSCRIPTOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE #if$_METABOLOMIC #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC #if$_PROTEOMIC #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC #if$_PHENOTYPIC #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC #if$_OTHEREP und $_OTHEREP werden auch verwendet, um Daten zu speichern und die Daten werden dort ebenfalls verarbeitet.#endif$_OTHEREP

Die Einreichung ist kostenlos, und es ist das Ziel (zumindest von ENA), so viele Daten wie möglich zu erhalten. Daher sind Absprachen weder notwendig noch sinnvoll. Catch-all-Repositories sind nicht erforderlich. #if$_DATAPLANT Für DataPLANT wurde dies vereinbart. #endif$_DATAPLANT #issuewarning Wenn keine Datenmanagementplattform wie DataPLANT verwendet wird, müssen Sie ein geeignetes Repository finden, um Ihre Daten nach der Veröffentlichung zu speichern oder zu archivieren. #endissuewarning

Data management plan of $_PROJECT for BBSRC

a document template