Skip to content

refs changed while git fsck running are likely to be reported as missing, resulting false positives for repository checks


DISCLAIMER: This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.


Summary

Customers report false positives from repository checks; with missing commits being incorrectly reported. Investigation identifies that these are, or were HEAD commits on branches, so the reference to that SHA has come from the refs in the repository.

It's typically most commonly seen on large repositories that take a long time to check.

It appears that git fsck doesn't keep track of additions to the list of valid commits in the repository, and then when it checks all the refs against the list, refs that have changed during the git fsck aren't valid.

Planned resolution

We believe this is an inherent race in Git. The planned solution is to serialize transactions, which is Gitaly functionality we are getting ready for general availability. See &13306

Workaround

Overview

Most changes to a branch or tag in a Git repository while git fsck is running will cause an error.

High level, the goal of the workaround is to reduce false positives, and in detail to:

  1. Reduce the chance of git fsck running while changes are being made, by allowing Administrators to schedule running the check.

  2. Try to get a 'clean' run by attempting the check up to three times, as coded. Retries are only performed if a failure occurs, and once a 'clean' check has been obtained, the script is done with that project and writes the last-checked date for the project.

    • If there is genuinely a fault with the repository, retrying the check will not make any difference, and GitLab will report that the check failed.
    • If a bot or a developer pushes to the repository during all three retries, there will still be a false positive. This problem isn't expected to be completed fixed until the Gitaly Transactions features is generally available.
  3. Prevent Sidekiq from checking the repository, by executing it more frequently (every 14 days) so the coded threshold for Sidekiq (one month) should never be reached.

Potential issues

The datestamp for the last check is only written by the workaround script if the check succeeds.

This allows for further attempts to be made to get a successful git fsck. The workaround code and Sidekiq both check this datestamp, so either would be able to perform additional checks, but Sidekiq will record a datestamp for the failed check.

This prevents Sidekiq re-checking for another month, but it'll also prevent the workaround code checking for another two weeks.

If you still get false positives even after three retries, try and find a quieter time to check the repository, and/or increase the number of retries by adding more elements to the tries array.

Workaround deployment and implementation

These steps are for a packaged GitLab (Omnibus) installation. The script should work on other deployments, with some modifications such as:

  • For a GitLab deployment in Kubernetes or Docker you'll need to identify a way to schedule it, the paths may be different, and the script will need to be deployed each time the pod/container is spun up.
  • For self-compiled, the paths will be different.
  1. Select a GitLab node configured to run Rails (Sidekiq or Puma). If your environment has a node selected to run database migrations during upgrades, this could be a good candidate.

  2. Determine the account to use: it'll be the gitlab_user from /opt/gitlab/etc/gitlab-rails-rc.

    • Usually the account is git, the following steps will assume this.
  3. Copy the script below to the server in a location readable by that account (source).

    check-repository-workaround.rb

    • Not root's home directory (read more about rails runner )
    • The following steps assume: /var/opt/gitlab/check-repository-workaround.rb
    • It does not need to be executable.
  4. If you need to check multiple projects at different times, for example to accommodate different time zones when developers aren't working, you'll need more than one copy of the the script. The project list is hard coded in the script.

  5. Determine the project IDs of affected projects.

    • The GitLab Web UI can be used to determine the project ID.
    • Affected projects will have been reported regularly as having failed repository checks, but when you re-run the check, the check passes.
    • Affected projects are likely to be large repositories. The git fsck will take longer to run, and so there's more chance of a git push to the repository while the check is running.
  6. Modify projects array at the top of the script to list the projects that need to be checked.

  7. Set up a cron job

    echo '45 22 * * 6 git /usr/bin/gitlab-rails runner /var/opt/gitlab/check-repository-workaround.rb >> /var/log/gitlab/gitlab-rails/cron_dot_d_check-repository-workaround.log 2>&1' > /etc/cron.d/check-repository-workaround
    • This uses the assumed account (git) and path for the script (/var/opt/gitlab)
    • See man 5 crontab for details on the time and date fields - this example runs at 22:45 on a Saturday evening.
    • The log will be rotated by the logrotate service deployed as part of packaged GitLab (see gitlab-ctl status)
    • Schedule the script to run at least weekly. If a successful check occurs, the script only checks each project every two weeks (13-14 days). But, if the project check keeps failing, and the script runs every week, for example, it'll get more opportunities to try and get a successful check. If it doesn't get a successful check after 2-3 weeks, the Sidekiq job will trigger a check as well.
  8. It can be run manually as well, or instead, of a cron job. Projects with a recent check (either by the script, or by Sidekiq) will not get checked: the script logs this.

    sudo /usr/bin/gitlab-rails runner /var/opt/gitlab/check-repository-workaround.rb

Steps to reproduce

  1. Start with a large repository so git fsck takes long enough. Reproduced on a 4.5gb repo (comprising the master branch of the nixpkgs repo)
    • It may be necessary to inject this repo into your test environment by transferring the .git directory direct into the git-data/repositories directory tree. More direct methods like pushing 4.5gb, or pulling it in via project import, typically fail.
  2. Kick off a repository check
  3. Push commits and tags to the repo
  4. git fsck fails with missing commit errors.

Example Project

What is the current bug behavior?

False positives about missing commits in projects.

What is the expected correct behavior?

Repository check accounts for changes to the repository that occur while the check is happening.

Relevant logs and/or screenshots

Could not fsck repository: missing tag 626f030a8754caacac6e8b54e8a39244b8a9c345
missing tag ebfc03685d9346b4c2d41ca8fe32cef026785aca
missing tag 4c3c05ba88caede6e5037a542b77d1ce948f0f82
missing tag 61c10504123f1020fa3d924c84d71d59f84849b1
missing tag 47b9c0ce0f2d69f015c5e39e7dcf1d65e9b6a603
missing tag fbcbc1b06787b65bcbebe35fa495186562b8ffcb
missing commit 2840c7e0097bcfe0caeb3ee44ef6d62c76332255
missing tag 2b2ac8d60ddd148719703924662c66d56bad8eb7
missing tag 24ecc8c98fdaf448a5fb2a509c1dab05d42d1c78
  • Reproduced by pushing to a branch and pushing a new tag every five seconds while a 27 minute long repository check ran
  • 152 of the 173 tags added were reported as missing.
  • The 152nd change to the branch was also reported as missing.
  • Public test harness project

GitLab team members can read more in a confidential issue

Output of checks

Reported by a customer on both 14.x and 16.11.

Reproduced on 17.0 and 17.5.1

Possible fixes

Edited by Ben Prescott_
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information