Home Fundamentals Research Data Management FAIR Data Principles Metadata Ontologies Data Sharing Data Publications Data Management Plan Version Control & Git Public Data Repositories Persistent Identifiers Electronic Lab Notebooks (ELN) DataPLANT Implementations Annotated Research Context ARC specification ARC Commander Swate MetadataQuiz DataHUB DataPLAN Ontology Service Landscape ARC Commander Manual Setup Git Installation ARC Commander Installation Windows MacOS Linux ARC Commander DataHUB Access Before we start Central Functions Initialize Clone Connect Synchronize Configure Branch ISA Metadata Functions ISA Metadata Investigation Study Assay Update Export ARCitect Manual Installation - Windows Installation - macOS Installation - Linux QuickStart QuickStart - Videos ARCmanager Manual What is the ARCmanager? Connect to your DataHUB View your ARCs Create new ARCs Add new studies and assays Upload files Add metadata to your ARCs Swate Manual QuickStart QuickStart - Videos Annotation tables Building blocks Building Block Types Adding a Building Block Filling cells with ontology terms Advanced Term Search File Picker Templates Contribute Templates ISA-JSON DataHUB Manual Overview User Settings Generate a Personal Access Token (PAT) Projects Panel ARC Panel Forks Working with files ARC Settings ARC Wiki Groups Panel Create a new user group CQC Pipelines & validation Find and use ARC validation packages Data publications Passing Continuous Quality Control Submitting ARCs with ARChigator Track publication status Use your DOIs Guides ARC User Journey Create your ARC ARCitect QuickStart ARCitect QuickStart - Videos ARC Commander QuickStart ARC Commander QuickStart (Experts) Annotate Data in your ARC Annotation Principles ISA File Types Best Practices For Data Annotation Swate QuickStart Swate QuickStart - Videos Swate Walk-through Share your ARC Register at the DataHUB DataPLANT account Invite collaborators to your ARC Sharing ARCs via the DataHUB Work with your ARC Using ARCs with Galaxy Computational Workflows CWL Introduction CWL runner installation CWL Examples CWL Metadata Recommended ARC practices Syncing recommendation Keep files from syncing to the DataHUB Managing ARCs across locations Working with large data files Adding external data to the ARC ARCs in Enabling Platforms Publication to ARC Troubleshooting Git Troubleshooting & Tips Contribute Swate Templates Knowledge Base Teaching Materials Events 2023 Nov: CEPLAS PhD Module Oct: CSCS CEPLAS Start Your ARC Sept: MibiNet CEPLAS Start Your ARC July: RPTU Summer School on RDM July: Data Steward Circle May: CEPLAS Start Your ARC Series Start Your ARC Series - Videos Events 2024 TRR175 Becoming FAIR CEPLAS ARC Trainings – Spring 2024 MibiNet CEPLAS DataPLANT Tool-Workshops TRR175 Tutzing Retreat Frequently Asked Questions

Git Troubleshooting & Tips

last updated at 2024-07-22 About this guide
UserData Steward ModeTutorial
Background

Some reasons, why we now sometimes run into git issues

Debugging
  1. (if required) Install Git on user machine

    💡 check installation via git --version in a fresh command line / terminal / powershell window

  2. navigate to the ARC in trouble (via one of many options below)

  1. try some of the git commands and debugging below

💡 This is not an exhaustive trouble-shooting list. In most cases git and search machines are your friends. Most Git error messages (displayed in the command line or inside ARCitect) include helpful commands to solve the problem or can easily be searched for in the internet.

Error messages
error message* possible reason possible solution
remote: HTTP Basic: Access denied fatal: Authentication failed for 'https://gitlab.nfdi4plants.de/UserName/ARCname' Your computer is not "linked" to your DataHUB account Access Denied
error: failed to push some refs to 'https://gitlab.nfdi4plants.de/UserName/ARCname' hint: Your push was rejected due to missing or corrupt local objects. You tried to upload LFS-tracked files that are not present on your computer Git-LFS
error: failed to push some refs to 'https://gitlab.nfdi4plants.de/UserName/ARCname' hint: Updates were rejected because the remote contains work that you do not have locally. Your local ARC is out of sync with the remote. ARC not in sync with the DataHUB
ERROR: Can not sync with remote as no remote repository address was specified. There is no URL specified for your ARC's remote Git remote
ERROR: GIT: fatal: repository 'https://gitlab.nfdi4plants.de/UserName/ARCname.git' not found The remote URL does not exist Git remote
ERROR: GIT: fatal: detected dubious ownership This is an error typically seen when working on mounted network drives Dubious ownership
fatal: credential-cache unavailable; no unix socket support Likely happens on Windows, if a gitconfig credential.helper=cache Adjust the Git Credential helper setting
fatal: Need to specify how to reconcile divergent branches. Your ARC contains multiple branches that progressed independently and need to be merged Contact a data steward.
error: unable to create file <path/to/file> : Filename too long Likely occurs on Windows, if your ARC is stored in a deeply nested folder, i.e. a folder in a folder in a folder ... Store the ARC on a higher level.

💡 *typically displayed during synchronization via ARCitect (DataHUB Sync --> push / pull) or arc sync. Even if ARCitect shows "Complete", it's sometimes worth it to scroll up and see these errors.

Your two favorite Git commands: status and log

Whenever your asked for ARC support likely related to a git issue, the first thing you want to explore is the state of the ARC.

git status

To get a good summary of the ARC including

git status

If everything's clear and committed, this should prompt something like

Your branch is up to date with ... nothing to commit, working tree clean

git log

Now, to compare the status of the local clone vs. that of the remote (i.e. the DataHUB) with a bit more confidence and wording, use

git log

This displays the commit history (messages) of the ARC reverse-chronologically, i.e. top-most = latest. So if the top commit message of the local ARC is different from the last commit message displayed in the DataHUB, the ARC is out of sync.

If you like it prettier, remember "a dog"...

git log --all --decorate --oneline --graph

Hit qto close the log.

Git configuration

The gitconfig is basically the settings and preferences for your git installation. There are three types of gitconfigs. Depending on the tool (ARCitect, ARC Commander) and operating system (macOS, Linux, Windows), different git settings may be received from different config files.

flag meaning
--global current user on that computer
--system system-wide (all users)
--local current repository (ARC)

The following command lists all configurations and where they originate (--show-origin) from and what there scope is (--show-scope).

git config --list --show-origin --show-scope

💡 The output will be different depending on wether you are inside or outside an ARC (git repository).

In order to only show e.g. the global gitconfig use

git config --global --list

Typical settings to explore and trouble-shoot

Changing git config

Editing the respective gitconfig is ideally done via command line (quick internet search helps). 💡 One could edit the file (listed in git config --list --show-origin) via a text editor. However, this is rather error-prone.

Adapt user name and email git config --global user.name "Your Name" git config --global user.email "Your eMail" Set main as default branch git config --global init.defaultBranch main Git Credential Helper

The gitconfig contains a setting, whether and how to save git credentials on your machine called credential.helper.

On Windows, you might run into the error fatal: credential-cache unavailable; no unix socket support, if it is set to credential.helper=cache.

This can be solved by either of the following:

  1. Remove "credential.helper=cache" via git config --global --unset credential.helper.
  2. Overwrite the setting with "store" instead of "cache" via git config --global credential.helper store.

💡 If you use ARC commander, we recommend to use the second approach to keep storing your credentials for DataHUB synchronization.

Git remote

For ARCs the "remote" is the DataHUB. The remote address (ARC url) is stored in the git of the local ARC. Display the URL, to which the local ARC is connected via

git remote -v Adding a remote during arc sync

A default remote is usually added by ARC Commander or ARCitect. If the ARC does not yet exist in the DataHUB, and you created it via ARC Commander and synced it via arc sync, you will see this error:

ERROR: GIT: fatal: repository 'https://gitlab.nfdi4plants.de/UserName/ARCname.git/' not found GIT: warning: redirecting to https://gitlab.nfdi4plants.de/UserName/ARCname.git/ ... GIT: remote: The private project UserName/ARCname was successfully created.

This is not to worry about, the ARC was created in the DataHUB during this process.

If you only see the error ERROR: GIT: fatal: repository 'https://gitlab.nfdi4plants.de/UserName/ARCname.git/' not found, but not the following lines mentioning that the ARC was created automatically, make sure to use the "force", i.e. arc sync --force ....

Adding a remote via git

If above command does not display any remote, you can add one via

git remote add origin https://gitlab.nfdi4plants.de/<UserName>/<ARCName> Editing a remote

You can edit a remote via

git remote set-url origin https://gitlab.nfdi4plants.de/<UserName>/<ARCName> Branches

As of now, the DataPLANT tools focus on working on a single branch (main). It can still happen that your ARC has multiple branches e.g. by accident (see git config --> init.defaultbranch) or because some git-affine collaborator knows how to create them. To display the branches of the local ARC, use

git branch

💡 the current branch is marked with an asterisk (*) to the left

If you also want to display branches that exist on the remote (but not locally), use

git branch --all Common issues and error messages ARC (files) open in multiple programs

A common source for issues are multiple programs that work on the ARC in parallel.

ARC not in sync with the DataHUB

Your local ARC is likely out of sync with the remote. This happens, if you or an invited colleague work(s) on the same ARC from a different location (e.g. the DataHUB or another computer). Before working on your ARC, make sure to update the local clone via one of these

Access denied

Sometimes you run into permission issues such as

remote: HTTP Basic: Access denied. The provided password or token is incorrect or your account has 2FA enabled and you must use a personal access token instead of a password. fatal: Authentication failed for 'https://gitlab.nfdi4plants.de/UserName/ARCName.git/'

This is due to missing or outdated DataHUB credentials on your computer. It usually helps to just retrieve new ones. If not, you might have to remove existing credentials stored on your computer.

Authenticate the computer

Option 1: via ARC Commander

Option 2: "by hand"

  1. Login to the DataHUB
  2. Create a new Personal Access Token (PAT) with scope api
  3. Run a git command (e.g. arc sync, git pull) to trigger being asked for git credentials
    1. Provide your DataHUB username
    2. Use the token instead of your password
Delete stored credentials

If (new) authentication alone does not help, you might need to delete existing tokens or passwords first.

  1. Run git config --get-regexp "credential" to find out whether and where credentials are stored

  2. This typically displays one of the following

    credential.helper store

    credential.helper osxkeychain (only on macOS)

  3. If credential.helper store is displayed, the credentials are typically stored in ~/.git-credentials, a hidden text file stored in the user's home folder. Edit this file and delete the row(s) containing "git.nfdi4plants.org" (https://<UserName>:<Token>@git.nfdi4plants.org).

  4. On macOS (if credential.helper osxkeychain is displayed) open the app "Keychain Access", search and delete passwords for "git.nfdi4plants.org".

Dubious ownership

The error ERROR: GIT: fatal: detected dubious ownership typically occurs when working on a mounted network drive (Fileshare, File Server, NAS). Very simplified: the user on the computer and the owner of the network drive differ and git tries to safe you from working in a folder you do not own.

You can add the path to the ARC to the list of safe directories via the command

git config --global --add safe.directory %(prefix)///servername/share/path/to/ARC/

You can circumvent this error by adding all directories to your list of safe directories via the command

git config --global --add safe.directory *

⚠️ This might however pose a safety risk. Please read the details here: https://www.git-scm.com/docs/git-config#Documentation/git-config.txt-safedirectory

Git LFS

Git LFS is basically the system in the back to simplify working with git and (ARCs containing) large data files. ARC commander and ARCitect offer options to download (clone) an ARC without large files; speeding up the process and avoiding waste of data storage, if you are only interested e.g. in the metadata.

If you have downloaded (cloned) an ARC without large files and try to upload it to a new location (i.e. new remote due to a transfer to other user, group, etc.), you will see the following or similar error

hint: Your push was rejected due to missing or corrupt local objects. error: failed to push some refs to 'https://gitlab.nfdi4plants.de/UserName/ARCName.git'

In this case you would have to download all LFS objects from the original remote first → ask a data steward for help.

Step-by-step track large file(s) via lfs

Done in small steps plus capturing log

git lfs track "assays/RNAseq_RawData/dataset/**" ## Track files via LFS (this adds them to .gitattributes) git add .gitattributes ## git track .gitattributes first git add assays/RNAseq_RawData/dataset/* ## git track the large files GIT_CURL_VERBOSE=1 GIT_TRACE=1 GIT_TRACE_PACKET=1 git commit -m "add rnaseq files to LFS" -v >> git-commit-LFS.log 2>&1 & GIT_CURL_VERBOSE=1 GIT_TRACE=1 GIT_TRACE_PACKET=1 git push -v >> git-push-LFS.log 2>&1 & Check the status of lfs files git lfs status List LFS-tracked files

To get a list of LFS-tracked files including the size of the original file, run

git lfs ls-files -ls

This will display the object ID (oid), the relative path to the file and the object size. The oid is also stored in the pointer file at the file's position.

💡 If checked-out and downloaded, a file with an oid 77080c4dc5820ede3e992e8116772ae6ec6ba6096e05df4e49fbb5f0665544b2 would be in the folder .git/lfs/objects/77/08/. So the first 4 characters of the OiD are split into two subfolders of .git/lfs/objects/ (i.e. /77/08/).

Debug LFS-tracked files

To get a report of all LFS-tracked files including there status, use

git lfs ls-files -d

Amongst others, this report will print for every LFS file, whether it is downloaded (checkout: true; download: true) to the local ARC or not (checkout: false; download: false).

Get more log

To help troubleshooting add (some or all) variables GIT_CURL_VERBOSE=1 GIT_TRACE=1 GIT_TRACE_PACKET=1 before your git command to get more info, e.g.

GIT_CURL_VERBOSE=1 GIT_TRACE=1 GIT_TRACE_PACKET=1 git push -v >> git-push-LFS.log 2>&1 &

DataPLANT Support

Besides these technical solutions, DataPLANT supports you with community-engaged data stewardship. For further assistance, feel free to reach out via our helpdesk or by contacting us directly .
Contribution Guide 📖
✏️ Edit this page