GitLab Workhorse gets stuck if it can't write to logs
Summary
We have a customer where /opt/gitlab/embedded/bin/svlogd
got corrupted and was transformed into an empty file.
This causes gitlab-workhorse
to get stuck when trying to start via gitlab-ctl start gitlab-workhorse
. Starting the workhorse process manually works.
From the behavior we observed, we noticed that in the strace of the stuck process, it got stuck after trying to print the first line of the logs which includes the build date.
It also appears that gitlab-workhorse
is the only service that got broken with the corruption of svlogd
Steps to reproduce
- Stop GitLab:
gitlab-ctl stop && systemctl stop gitlab-runsvdir
- Truncate
/opt/gitlab/embedded/bin/svlogd
:truncate -s 0 /opt/gitlab/embedded/bin/svlogd
- Restart the server.
-
gitlab-workhorse
will not start correctly and will get stuck in this line instrace
:
[pid 552789] futex(0xc000062948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 552788] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 552786] write(2, "{\"build_time\":\"20240424.111925\","..., 118 <unfinished ...>
[pid 552792] futex(0x27b8df8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 552786] write(2, "{\"build_time\":\"20240424.111925\","..., 118 <unfinished ...>
is the first log gitlab-workhorse
will attempt to print.
Example Project
What is the current bug behavior?
GitLab workhorse doesn't handle failing writing to the logs correctly which causes it to get stuck
What is the expected correct behavior?
GitLab workhorse should process normally even if it can't write to the logs.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)