Handle repos with a stuck HEAD.lock better
Problem to solve
We just spent an hour debugging all merges failing on one of our repos. Pressing the merge button resulted in this:
Checking Sentry, sidekiq logs, and metrics didn't tell us anything useful about what really is happening. After unprotecting master and trying to push we got this:
remote:
remote: Another git process seems to be running in this repository, e.g.
remote: an editor opened by 'git commit'. Please make sure all processes
remote: are terminated then try again. If it still fails, a git process
remote: may have crashed in this repository earlier:
remote: remove the file manually to continue.
SSHing in to a gitlab container and removing HEAD.lock
fixed merges.
Proposal
Couple things could be done to make this easier to solve:
- Not sure what exactly failed when that error was displayed in the merge button widget, but whatever it is, it should be creating a Sentry event when failing
- The web UI should probably display the whole 'Another git process seems to be running in this repository […]' error, not just 'Could not update branch master'
- Tracking down and fixing how this lock got stuck would be nice too, but probably unrealistically difficult to solve — the cause might be us using NFS with
lookupcache=positive
for better performance, so maybe it's even expected to fail like this - But anyway, detecting when the repo is locked, displaying it on the web UI, and maybe even providing a button on the UI to force remove the lock would be nice
What does success look like, and how can we measure that?
Imagine a new customer bumping into the same issue. Try to guess what they might do to resolve it. It should take no more than 5 minutes for them to identify the cause and fix the repository.