Gitaly hooks symlinks may be deleted after 10 days of uptime when 'noatime' is set on /tmp, preventing all tasks executed by hooks from running
<!---
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by the "regression" or "type::bug" label:
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=type::bug
and verify the issue you're about to submit isn't a duplicate.
--->
### Summary
<!-- Summarize the bug encountered concisely. -->
Update: See [Will's comment](https://gitlab.com/gitlab-org/gitaly/-/issues/4113#note_877567514) for what is actually happening. `noatime` on `/tmp`, or long periods of inactivity, result in Git hook symlinks getting deleted. Amazon Linux has `noatime` on `/tmp` by default. **The current workarounds to prevent this** would be to remove `noatime`, modify the service which deletes `/tmp` files, or `touch` the file periodically. You may [override the `tmpfiles.d` service behavior](https://gitlab.com/gitlab-org/gitaly/-/issues/4113#note_900051517) as a simple workaround to avoid this issue on most distros. See the original description below.
Sometimes, in a GitLab self-managed instance (seen so far in two cases that I know of, both on 14.8.2), the post-receive hooks aren't being fired off by Gitaly across the entire instance, causing a lot of weird issues.
I learned about this when working with a customer on a [support ticket (internal)](https://gitlab.zendesk.com/agent/tickets/274405), including an extended call. Additionally, @wchandler was able to reproduce it briefly.
#### Potential workaround
Restarting Gitaly may be enough to work around the issue for now:
```
sudo gitlab-ctl restart gitaly
```
### Steps to reproduce
<!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. -->
Right now, we aren't sure how to reproduce this.
### What is the current *bug* behavior?
<!-- Describe what actually happens. -->
Post-receive hooks don't fire. As a result, many other parts of GitLab stop working:
1. Pushes from a terminal don't provide a link to create a merge request.
1. Activity is not logged (such as in the **Project information** -> **Activity** page).
1. Cached repository information is stale.
This results in at least a couple weird behaviors when in a new branch created while this bug is observed:
- Clicking a file gives inactive **Edit** and **Web IDE** buttons, with a popover which states that **You can only edit files when you are on a branch**.
- Merge requests have the following message next to the inactive **Merge** button:
```
The source branch `new_branch` does not exist. Please restore it or use a different branch.
```
These may be fixed by running `sudo gitlab-rake cache:clear`.
1. GitLab CI/CD pipelines are not triggered.
1. Webhooks/integrations are not triggered.
### What is the expected *correct* behavior?
<!-- Describe what you should see instead. -->
Post-receive hooks should fire. Gitaly logs should have entries that look like the following (on a single node instance):
```
{
"content_length_bytes": 217,
"correlation_id": "01FY7GRMGVMNN260XZB5NY3TPP",
"duration_ms": 51,
"level": "info",
"method": "POST",
"msg": "Finished HTTP request",
"status": 200,
"time": "2022-03-15T19:27:04.348Z",
"url": "http://unix/api/v4/internal/post_receive"
}
```
Then, the rest of the GitLab functionality triggered by the built-in post-receive hooks should work.
### Relevant logs and/or screenshots
<!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code
as it's tough to read otherwise. -->
Right now, the most sure sign of this happening in the logs seems to be the absence of calls to the `post_receive` API endpoint.
### Output of checks
<!-- If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com -->
#### Results of GitLab environment info
<!-- Input any relevant GitLab environment information if needed. -->
<details>
<summary>Expand for output related to GitLab environment info</summary>
<pre>
(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)
(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
</pre>
</details>
#### Results of GitLab application Check
<!-- Input any relevant GitLab application check information if needed. -->
I will ask the customer to run the commands to fill out the information below, but their instance was a single node on 14.8.2, and not doing anything unusual.
<details>
<summary>Expand for output related to the GitLab application check</summary>
<pre>
(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:check SANITIZE=true`)
(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`)
(we will only investigate if the tests are passing)
</pre>
</details>
### Possible fixes
<!-- If you can, link to the line of code that might be responsible for the problem. -->
I've yet to track down how/where this could be happening.
issue