Gitaly hooks symlinks may be deleted after 10 days of uptime when 'noatime' is set on /tmp, preventing all tasks executed by hooks from running
Summary
Update: See Will's comment for what is actually happening. noatime
on /tmp
, or long periods of inactivity, result in Git hook symlinks getting deleted. Amazon Linux has noatime
on /tmp
by default. The current workarounds to prevent this would be to remove noatime
, modify the service which deletes /tmp
files, or touch
the file periodically. You may override the tmpfiles.d
service behavior as a simple workaround to avoid this issue on most distros. See the original description below.
Sometimes, in a GitLab self-managed instance (seen so far in two cases that I know of, both on 14.8.2), the post-receive hooks aren't being fired off by Gitaly across the entire instance, causing a lot of weird issues.
I learned about this when working with a customer on a support ticket (internal), including an extended call. Additionally, @wchandler was able to reproduce it briefly.
Potential workaround
Restarting Gitaly may be enough to work around the issue for now:
sudo gitlab-ctl restart gitaly
Steps to reproduce
Right now, we aren't sure how to reproduce this.
What is the current bug behavior?
Post-receive hooks don't fire. As a result, many other parts of GitLab stop working:
-
Pushes from a terminal don't provide a link to create a merge request.
-
Activity is not logged (such as in the Project information -> Activity page).
-
Cached repository information is stale.
This results in at least a couple weird behaviors when in a new branch created while this bug is observed:
-
Clicking a file gives inactive Edit and Web IDE buttons, with a popover which states that You can only edit files when you are on a branch.
-
Merge requests have the following message next to the inactive Merge button:
The source branch `new_branch` does not exist. Please restore it or use a different branch.
These may be fixed by running
sudo gitlab-rake cache:clear
. -
-
GitLab CI/CD pipelines are not triggered.
-
Webhooks/integrations are not triggered.
What is the expected correct behavior?
Post-receive hooks should fire. Gitaly logs should have entries that look like the following (on a single node instance):
{
"content_length_bytes": 217,
"correlation_id": "01FY7GRMGVMNN260XZB5NY3TPP",
"duration_ms": 51,
"level": "info",
"method": "POST",
"msg": "Finished HTTP request",
"status": 200,
"time": "2022-03-15T19:27:04.348Z",
"url": "http://unix/api/v4/internal/post_receive"
}
Then, the rest of the GitLab functionality triggered by the built-in post-receive hooks should work.
Relevant logs and/or screenshots
Right now, the most sure sign of this happening in the logs seems to be the absence of calls to the post_receive
API endpoint.
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
I will ask the customer to run the commands to fill out the information below, but their instance was a single node on 14.8.2, and not doing anything unusual.
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
I've yet to track down how/where this could be happening.