Geo::RepositorySyncService: fetch remote: getting Git version: cannot stat Git binary: stat /var/opt/gitlab/gitaly/run/gitaly-11306/git-exec-3671663053.d/git: no such file or directory
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Upgrade to 16.1.6 (from 15.11.13) broke Geo replication for all secondary nodes.
{"severity":"ERROR","time":"2024-01-13T22:03:36.732Z","correlation_id":"4045ea7fff01d64c713c46810dcd9b3b","class":"Geo::RepositorySyncService","gitlab_host":"secondary","message":"Error syncing repository","project_id":574,"project_path":"project1","storage_version":2,"error":"13:fetch remote: getting Git version: cannot stat Git binary: stat /var/opt/gitlab/gitaly/run/gitaly-11306/git-exec-3671663053.d/git: no such file or directory."}
{"severity":"ERROR","time":"2024-01-13T22:03:36.737Z","correlation_id":"b9187600f20bfc9fd598f155717a7ab2","class":"Geo::RepositorySyncService","gitlab_host":"secondary","message":"Error syncing repository","project_id":4168,"project_path":"project2","storage_version":2,"error":"13:fetch remote: getting Git version: cannot stat Git binary: stat /var/opt/gitlab/gitaly/run/gitaly-11306/git-exec-3671663053.d/git: no such file or directory."}
{"severity":"ERROR","time":"2024-01-13T22:03:42.816Z","correlation_id":"cc3c05ce2e52bb4b489a94e4b7ec4198","class":"Geo::RepositorySyncService","gitlab_host":"secondary","message":"Error syncing repository","project_id":3721,"project_path":"project3","storage_version":2,"error":"13:fetch remote: getting Git version: cannot stat Git binary: stat /var/opt/gitlab/gitaly/run/gitaly-11306/git-exec-3671663053.d/git: no such file or directory."}
(secondary) # ls -l /var/opt/gitlab/gitaly/run/gitaly-11306/git-exec-3671663053.d/
total 0
lrwxrwxrwx 1 git git 41 Dec 8 13:53 git -> /opt/gitlab/embedded/bin/gitaly-git-v2.39
lrwxrwxrwx 1 git git 54 Dec 8 13:53 git-http-backend -> /opt/gitlab/embedded/bin/gitaly-git-http-backend-v2.39
lrwxrwxrwx 1 git git 41 Dec 8 13:53 git-receive-pack -> /opt/gitlab/embedded/bin/gitaly-git-v2.39
lrwxrwxrwx 1 git git 53 Dec 8 13:53 git-remote-ftp -> /opt/gitlab/embedded/bin/gitaly-git-remote-http-v2.39
lrwxrwxrwx 1 git git 53 Dec 8 13:53 git-remote-ftps -> /opt/gitlab/embedded/bin/gitaly-git-remote-http-v2.39
lrwxrwxrwx 1 git git 53 Dec 8 13:53 git-remote-http -> /opt/gitlab/embedded/bin/gitaly-git-remote-http-v2.39
lrwxrwxrwx 1 git git 53 Dec 8 13:53 git-remote-https -> /opt/gitlab/embedded/bin/gitaly-git-remote-http-v2.39
lrwxrwxrwx 1 git git 41 Dec 8 13:53 git-upload-archive -> /opt/gitlab/embedded/bin/gitaly-git-v2.39
lrwxrwxrwx 1 git git 41 Dec 8 13:53 git-upload-pack -> /opt/gitlab/embedded/bin/gitaly-git-v2.39
(secondary) # ls -l /opt/gitlab/embedded/bin/gitaly-git*
-rwxr-xr-x 1 root root 2568216 Jan 11 04:07 /opt/gitlab/embedded/bin/gitaly-git-http-backend-v2.40
-rwxr-xr-x 1 root root 2586600 Jan 11 04:07 /opt/gitlab/embedded/bin/gitaly-git-http-backend-v2.41
-rwxr-xr-x 1 root root 2623392 Jan 11 04:07 /opt/gitlab/embedded/bin/gitaly-git-remote-http-v2.40
-rwxr-xr-x 1 root root 2642016 Jan 11 04:07 /opt/gitlab/embedded/bin/gitaly-git-remote-http-v2.41
-rwxr-xr-x 1 root root 4248872 Jan 11 04:07 /opt/gitlab/embedded/bin/gitaly-git-v2.40
-rwxr-xr-x 1 root root 4267768 Jan 11 04:07 /opt/gitlab/embedded/bin/gitaly-git-v2.41
Obviously, the problem is that gitaly never got restarted during the upgrade.
(secondary) # gitlab-ctl status
run: alertmanager: (pid 11122) 21191s; run: log: (pid 10922) 21235s
run: geo-logcursor: (pid 11071) 21193s; run: log: (pid 10966) 21232s
run: geo-postgresql: (pid 728) 3141246s; run: log: (pid 10926) 21234s
run: gitaly: (pid 716) 3141246s; run: log: (pid 10766) 21285s # <--- ~3000000s vs others that are ~20000s
run: gitlab-exporter: (pid 11147) 21191s; run: log: (pid 10916) 21237s
run: gitlab-kas: (pid 11025) 21194s; run: log: (pid 10794) 21282s
run: gitlab-pages: (pid 11093) 21192s; run: log: (pid 10892) 21240s
run: gitlab-workhorse: (pid 11073) 21193s; run: log: (pid 10854) 21241s
run: logrotate: (pid 22924) 3190s; run: log: (pid 704) 3141246s
run: nginx: (pid 11158) 21190s; run: log: (pid 10877) 21240s
run: node-exporter: (pid 11227) 21190s; run: log: (pid 10913) 21238s
run: postgres-exporter: (pid 11233) 21189s; run: log: (pid 10924) 21234s
run: postgresql: (pid 720) 3141246s; run: log: (pid 10772) 21283s
run: prometheus: (pid 11241) 21189s; run: log: (pid 10920) 21236s
run: puma: (pid 11290) 21188s; run: log: (pid 10850) 21243s
run: redis: (pid 13054) 21129s; run: log: (pid 10760) 21286s
run: redis-exporter: (pid 11309) 21188s; run: log: (pid 10918) 21237s
run: registry: (pid 11106) 21192s; run: log: (pid 10894) 21239s
run: sidekiq: (pid 11317) 21188s; run: log: (pid 10852) 21242s
When running gitlab-ctl reconfigure it usually tells you to restart some services if it detects that the running version doesn't match the installed - this is definitely the case for redis.
It would be good if the running version of gitaly was also detected and suggestion printed in the same manner as an aid to prevent wasted time doing root cause analysis.
At the same time, I see that gitaly was restarted on the primary during the upgrade.
(primary) # gitlab-ctl status
run: alertmanager: (pid 2881766) 93385s; run: log: (pid 2881618) 93414s
run: gitaly: (pid 2881840) 93384s; run: log: (pid 2881352) 93520s # <---
run: gitlab-exporter: (pid 2881789) 93385s; run: log: (pid 2881612) 93416s
run: gitlab-kas: (pid 2881670) 93387s; run: log: (pid 2881474) 93516s
run: gitlab-pages: (pid 2881740) 93387s; run: log: (pid 2881590) 93418s
run: gitlab-workhorse: (pid 2881719) 93387s; run: log: (pid 2881554) 93420s
run: logrotate: (pid 176980) 3257s; run: log: (pid 4071048) 21804493s
run: mailroom: (pid 227919) 601s; run: log: (pid 2881576) 93420s
run: nginx: (pid 2881802) 93385s; run: log: (pid 2881578) 93420s
run: node-exporter: (pid 2881852) 93385s; run: log: (pid 2881609) 93417s
run: postgres-exporter: (pid 2881860) 93384s; run: log: (pid 2881620) 93414s
run: postgresql: (pid 3519968) 12800139s; run: log: (pid 2881357) 93519s
run: prometheus: (pid 2881869) 93384s; run: log: (pid 2881616) 93415s
run: puma: (pid 2881907) 93383s; run: log: (pid 2881549) 93422s
run: redis: (pid 2899057) 92875s; run: log: (pid 2881346) 93522s
run: redis-exporter: (pid 2881914) 93383s; run: log: (pid 2881614) 93416s
run: registry: (pid 2881751) 93387s; run: log: (pid 2881592) 93418s
run: sidekiq: (pid 2881922) 93383s; run: log: (pid 2881551) 93422s
So something is not quite right with the upgrade scripts when doing updates on the secondary nodes.

