Git repository invalid, and Gitlab Runner does not zap and re-create it, for Windows shell CI runners, and does not tell me what folder/path is invalid
We periodically see that Git repositories cloned or recently updated (fetched) by gitlab -ci-runner are invalid, and we get build failures like this:
Running with gitlab-ci-multi-runner {1.4.3,1.6,dev (HEAD)} (c7ed472)
Using Shell executor...
Running on DEV-HTML5...
Fetching changes...
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
Checking out 9b3d175f as master...
fatal: Not a git repository (or any of the parent directories): .git
ERROR: Build failed: exit status 128
I have tested this in 1.4, 1.5, 1.6 and against a git-head version built yesterday and the behaviour above repeats each time I retry.
In my opinion, this should have resulted in a similar action to what Jenkins CI would have done; Wipe the offending working directory and re-clone it. Either such logic does not exist or is not being triggered. Maintenance of a runner that requires manually zapping folders to get a CI runner that has fallen over up again seems unacceptable. Even manual retries also fail with the same persistence failure.
Typically developers are not granted SSH access (nor do they want it) into the Gitlab Runners on Linux, nor Remote Desktop access to a Windows runner. Even gitlab admins (me) will find fixing this to be a pain, when there are 400+ random hexadecimal working folders and I don't know which one is corrupt.
I see three problems:
-
The primary problem experienced by a user, may NOT be feasible or repairable as a change to Gitlab CI runner code. We cannot conclude that it's a gitlab-ci-runner bug as gitlab-ci could have been interfered with, such as by a Microsoft anti-virus problem reading the git working copy which causes a side effect of a filesystem lock (a common issue on Windows). It could also be a bug in the git binary itself.
-
The primary problem that I believe IS the gitlab-ci-runner's job is to detect a broken working copy and re-clone it, either upon the next build, or as N number of retries after the "not a valid git working copy" status is detected.
-
A workaround would be made easier if the full path of the invalid working copy was revealed by the runner. How am I as a gitlab admin to find the offending working copy and purge it manually? What a pain! The raw output does not include the workspace (git repo) folder path!
[0KRunning with gitlab-ci-multi-runner 1.4.3 (c7ed472)[0;m
[0;m[0KUsing Shell executor...
[0;mRunning on DEV-HTML5...
Fetching changes...
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
Checking out 9b3d175f as master...
fatal: Not a git repository (or any of the parent directories): .git
[31;1mERROR: Build failed: exit status 128
[0;m