Gitaly hooks symlinks may be deleted after 10 days of uptime when 'noatime' is set on /tmp, preventing all tasks executed by hooks from running
<!--- Please read this! Before opening a new issue, make sure to search for keywords in the issues filtered by the "regression" or "type::bug" label: - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=type::bug and verify the issue you're about to submit isn't a duplicate. ---> ### Summary <!-- Summarize the bug encountered concisely. --> Update: See [Will's comment](https://gitlab.com/gitlab-org/gitaly/-/issues/4113#note_877567514) for what is actually happening. `noatime` on `/tmp`, or long periods of inactivity, result in Git hook symlinks getting deleted. Amazon Linux has `noatime` on `/tmp` by default. **The current workarounds to prevent this** would be to remove `noatime`, modify the service which deletes `/tmp` files, or `touch` the file periodically. You may [override the `tmpfiles.d` service behavior](https://gitlab.com/gitlab-org/gitaly/-/issues/4113#note_900051517) as a simple workaround to avoid this issue on most distros. See the original description below. Sometimes, in a GitLab self-managed instance (seen so far in two cases that I know of, both on 14.8.2), the post-receive hooks aren't being fired off by Gitaly across the entire instance, causing a lot of weird issues. I learned about this when working with a customer on a [support ticket (internal)](https://gitlab.zendesk.com/agent/tickets/274405), including an extended call. Additionally, @wchandler was able to reproduce it briefly. #### Potential workaround Restarting Gitaly may be enough to work around the issue for now: ``` sudo gitlab-ctl restart gitaly ``` ### Steps to reproduce <!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. --> Right now, we aren't sure how to reproduce this. ### What is the current *bug* behavior? <!-- Describe what actually happens. --> Post-receive hooks don't fire. As a result, many other parts of GitLab stop working: 1. Pushes from a terminal don't provide a link to create a merge request. 1. Activity is not logged (such as in the **Project information** -> **Activity** page). 1. Cached repository information is stale. This results in at least a couple weird behaviors when in a new branch created while this bug is observed: - Clicking a file gives inactive **Edit** and **Web IDE** buttons, with a popover which states that **You can only edit files when you are on a branch**. - Merge requests have the following message next to the inactive **Merge** button: ``` The source branch `new_branch` does not exist. Please restore it or use a different branch. ``` These may be fixed by running `sudo gitlab-rake cache:clear`. 1. GitLab CI/CD pipelines are not triggered. 1. Webhooks/integrations are not triggered. ### What is the expected *correct* behavior? <!-- Describe what you should see instead. --> Post-receive hooks should fire. Gitaly logs should have entries that look like the following (on a single node instance): ``` { "content_length_bytes": 217, "correlation_id": "01FY7GRMGVMNN260XZB5NY3TPP", "duration_ms": 51, "level": "info", "method": "POST", "msg": "Finished HTTP request", "status": 200, "time": "2022-03-15T19:27:04.348Z", "url": "http://unix/api/v4/internal/post_receive" } ``` Then, the rest of the GitLab functionality triggered by the built-in post-receive hooks should work. ### Relevant logs and/or screenshots <!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's tough to read otherwise. --> Right now, the most sure sign of this happening in the logs seems to be the absence of calls to the `post_receive` API endpoint. ### Output of checks <!-- If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com --> #### Results of GitLab environment info <!-- Input any relevant GitLab environment information if needed. --> <details> <summary>Expand for output related to GitLab environment info</summary> <pre> (For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`) </pre> </details> #### Results of GitLab application Check <!-- Input any relevant GitLab application check information if needed. --> I will ask the customer to run the commands to fill out the information below, but their instance was a single node on 14.8.2, and not doing anything unusual. <details> <summary>Expand for output related to the GitLab application check</summary> <pre> (For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`) (we will only investigate if the tests are passing) </pre> </details> ### Possible fixes <!-- If you can, link to the line of code that might be responsible for the problem. --> I've yet to track down how/where this could be happening.
issue