Geo: Long running git fetch process can be interrupted by another
I attempted to fetch a project (2376445) in the failure list. After about an hour or so, I saw this:
Fetching remote geo for repository repo.git.
Fetching remote geo for repository repo.git failed.
error: cannot lock ref 'refs/tags/AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.098': Unable to create '/git-data-file01/repositories/namespace/project/./refs/tags/AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.098.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
You can see that multiple RepositorySyncService
jobs ran (https://109d3390fac34f3248388dc3d64a49da.us-central1.gcp.cloud.es.io:9243/goto/a0de2f83a24bddcf85685c70a72d404b):
Couple of questions:
-
Why did this fetch take so long to complete? This project appears to have thousands of tags (over 8000 at least), so the long fetch may be related to https://gitlab.com/gitlab-org/gitlab-ee/issues/4894. An strace showed a similar signature.
-
Should we have more of a guard against this (e.g. an exclusive lease that refreshes if the job is still active)? This may also relate to #4897 (closed).
I wonder if the --prune
arguments to git fetch
may have something to do with it. We should also try using Gitaly to fetch the remote directly on disk.