Skip to content
GitLab
Next
    • Why GitLab
    • Pricing
    • Contact Sales
    • Explore
  • Why GitLab
  • Pricing
  • Contact Sales
  • Explore
  • Sign in
  • Get free trial
  • GitLab.orgGitLab.org
  • gitalygitaly
  • Issues
  • #4113

Gitaly hooks symlinks may be deleted after 10 days of uptime when 'noatime' is set on /tmp, preventing all tasks executed by hooks from running

Summary

Update: See Will's comment for what is actually happening. noatime on /tmp, or long periods of inactivity, result in Git hook symlinks getting deleted. Amazon Linux has noatime on /tmp by default. The current workarounds to prevent this would be to remove noatime, modify the service which deletes /tmp files, or touch the file periodically. You may override the tmpfiles.d service behavior as a simple workaround to avoid this issue on most distros. See the original description below.

Sometimes, in a GitLab self-managed instance (seen so far in two cases that I know of, both on 14.8.2), the post-receive hooks aren't being fired off by Gitaly across the entire instance, causing a lot of weird issues.

I learned about this when working with a customer on a support ticket (internal), including an extended call. Additionally, @wchandler was able to reproduce it briefly.

Potential workaround

Restarting Gitaly may be enough to work around the issue for now:

sudo gitlab-ctl restart gitaly

Steps to reproduce

Right now, we aren't sure how to reproduce this.

What is the current bug behavior?

Post-receive hooks don't fire. As a result, many other parts of GitLab stop working:

  1. Pushes from a terminal don't provide a link to create a merge request.

  2. Activity is not logged (such as in the Project information -> Activity page).

  3. Cached repository information is stale.

    This results in at least a couple weird behaviors when in a new branch created while this bug is observed:

    • Clicking a file gives inactive Edit and Web IDE buttons, with a popover which states that You can only edit files when you are on a branch.

    • Merge requests have the following message next to the inactive Merge button:

      The source branch `new_branch` does not exist. Please restore it or use a different branch.

    These may be fixed by running sudo gitlab-rake cache:clear.

  4. GitLab CI/CD pipelines are not triggered.

  5. Webhooks/integrations are not triggered.

What is the expected correct behavior?

Post-receive hooks should fire. Gitaly logs should have entries that look like the following (on a single node instance):

{
  "content_length_bytes": 217,
  "correlation_id": "01FY7GRMGVMNN260XZB5NY3TPP",
  "duration_ms": 51,
  "level": "info",
  "method": "POST",
  "msg": "Finished HTTP request",
  "status": 200,
  "time": "2022-03-15T19:27:04.348Z",
  "url": "http://unix/api/v4/internal/post_receive"
}

Then, the rest of the GitLab functionality triggered by the built-in post-receive hooks should work.

Relevant logs and/or screenshots

Right now, the most sure sign of this happening in the logs seems to be the absence of calls to the post_receive API endpoint.

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

I will ask the customer to run the commands to fill out the information below, but their instance was a single node on 14.8.2, and not doing anything unusual.

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

I've yet to track down how/where this could be happening.

Edited Apr 18, 2022 by Andrew Conrad
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking