Clarification on stale lock files in @cluster/pools
Support Request for the Gitaly Team
The goal is to keep these requests public. However, if customer information is required to the support request, please be sure to mark this issue as confidential.
This request template is part of Gitaly Team's intake process.
Customer Information
Salesforce Link:
Zendesk Ticket:
https://gitlab.zendesk.com/agent/tickets/405115
Installation Size:
Medium
Architecture Information:
Kubernetes GitLab v15.9.3
Slack Channel:
Additional Information:
Support Request
Severity
Ticket was opened as a sev4 so tagging this issue as the same.
Looking at the metadata output in the ticket the repositories with the problem are fully synced on at least 2 of the 3 nodes (although not consistently the same nodes) so they do have redundancy at present.
Problem Description
A note here that they are running this Gitaly cluster in K8s, I have advised them they need to consider moving this out when they can.
Customer raised a ticket for clarification on some errors. One was due to wiki default branch names and is resolved the other was:
ReplicateRepository
finished unary call with code Internal
Another git process seems to be running in this repository
Further log diving shows errors like
"error": "synchronizing repository: fetch internal remote: fetch: exit status 254, stderr: \"error: cannot lock ref 'refs/remotes/origin/heads/XXXX': Unable to create '/home/git/repositories/@cluster/pools/b8/00/YYYY/refs/remotes/origin/heads/XXXX.lock': File exists.\\n\\nAnother git process seems to be running in this repository, e.g.\\nan editor opened by 'git commit'. Please make sure all processes\\nare terminated then try again. If it still fails, a git process\\nmay have crashed in this repository earlier:\\nremove the file manually to continue.\\n\"",
Further digging on the file system shows some pool folders have quite a few .lock files dated Apr 30 when they performed their restore in to this new k8s environment.
.lock files in a @cluster/pools folder
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/review-feature-ba-5inu7s/deployments/212.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-renovate-aws-j-hsbwc9/deployments/1148.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-integration-49-23i62s/deployments/836.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-integration-d0-c0lqgk/deployments/1416.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-integration-65-dz8qie/deployments/18.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-integration-d7-zt7avj/deployments/1280.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-develop-a0ea99-pcr4g8/deployments/748.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-renovate-db-co-qjcz80/deployments/1399.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-master-030b541-6kefi9/deployments/181.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-integration-f9-z8hp92/deployments/1305.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-feature-babo-4-cxqbg2/deployments/289.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-feature-babo-4-cxqbg2/deployments/291.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-feature-babos-cbaatt/deployments/743.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/environments/at-renovate-aws-j-pqlrfk/deployments/1117.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/26fe3b3fcc9577d45b8e960a413c9102eaf4c867.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/2331f8af756f0faef55167b1b0c7bb7aa355c75b.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/3690a247cc62514b3a0bb8f4c0d750431260bc03.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/0b127170c0eb06d7b3717a19d6cd8dd87592a327.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/21c7b68a6eb8c1c7808c943b0232b1c05be9e8e9.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/133120c462ba100cc3173e269c386746c98511f1.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/1008c6c8edf5774f45918178b0f39f0ef98eabc5.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/3511393aee31df0ddd5972f36c99416a925d29d7.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/20035947b0973767321348e758ec18a22fe13c64.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/108817091b61f9cd19eb75c43968acfbf22b2f3c.lock
-rw-r--r-- 1 git git 41 Apr 30 15:31 ./refs/remotes/origin/keep-around/1b58a1f84f48245e9a160e71b9fec534a832969c.lock
Troubleshooting Performed
This is a busy instance so the logs from kubesos didn't cover a large period of time but the customer can see a collection of the same error as above in their collections.
The nightly housekeeping task is running and whilst I can see references to .lock files in that code it seems to skip these folders, likely for a good reason!
What specifically do you need from the Gitaly team
The housekeeping tasks seem to avoid checking these folders so:
- Can we simply delete these lock files given the age?
- Is it intended that these folders are not scanned for lock files? I believe they're a special case?
- If we cannot just delete these what is the recommended path forwards?
Thanks!
Author Checklist
-
Customer information provided -
Severity realistically set -
Clearly articulated what is needed from the Gitaly team to support your request by filling out the What specifically do you need from the Gitaly team