GitLab EE container does not survive restarts
Summary
Gitlab EE's Docker container has about a 90% chance of not recovering when it's restarted.
Steps to reproduce
docker restart gitlab
or shutdown now -r
work equally well.
Example Project
N/A
What is the current bug behavior?
GitLab fails to start for various reasons, ultimately leading to an infinite restart loop and the website being down.
Some of the reasons:
Preparing services...
Starting services...
Configuring GitLab package...
/opt/gitlab/embedded/bin/runsvdir-start: line 24: ulimit: pending signals: cannot modify limit: Operation not permitted
/opt/gitlab/embedded/bin/runsvdir-start: line 37: /proc/sys/fs/file-max: Read-only file system
Malformed configuration JSON file found at /opt/gitlab/embedded/nodes/[redacted].json.
This usually happens when your last run of `gitlab-ctl reconfigure` didn't complete successfully.
This file is used to check if any of the unsupported configurations are enabled,
and hence require a working reconfigure before upgrading.
Please run `sudo gitlab-ctl reconfigure` to fix it and try again.
or
Running handlers:
Running handlers complete
There was an error running gitlab-ctl reconfigure:
execute[reload nginx] (nginx::enable line 26) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of gitlab-ctl hup nginx ----
STDOUT: fail: nginx: runsv not running
STDERR:
---- End output of gitlab-ctl hup nginx ----
Ran gitlab-ctl hup nginx returned 1
Manually running docker exec -it gitlab update-permissions && docker exec -it gitlab gitlab-ctl reconfigure
has a small chance of recovering from this state (if I get to run it at all, since the Docker image is in a restart loop - I usually do docker restart -t 300 gitlab
to give myself and Chef more time), however it's more likely than not that it won't help. In that case I get this:
Running handlers:
There was an error running gitlab-ctl reconfigure:
Multiple failures occurred:
* Mixlib::ShellOut::ShellCommandFailed occurred in chef run: execute[reload nginx] (nginx::enable line 26) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of gitlab-ctl hup nginx ----
STDOUT: fail: nginx: runsv not running
STDERR:
---- End output of gitlab-ctl hup nginx ----
Ran gitlab-ctl hup nginx returned 1
* Mixlib::ShellOut::ShellCommandFailed occurred in delayed notification: service[gitlab-workhorse] (gitlab::gitlab-workhorse line 227) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /opt/gitlab/embedded/bin/chpst -u root /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitlab-workhorse ----
STDOUT: fail: /opt/gitlab/service/gitlab-workhorse: runsv not running
STDERR:
---- End output of /opt/gitlab/embedded/bin/chpst -u root /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitlab-workhorse ----
Ran /opt/gitlab/embedded/bin/chpst -u root /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitlab-workhorse returned 1
Running handlers complete
Chef Client failed. 41 resources updated in 09 seconds
However there is a method to reliably recover my GitLab instance every time: docker stop gitlab
, docker rm gitlab
, and docker run
(with parameters taken from https://docs.gitlab.com/omnibus/docker/ however I have to add --cap-add=SYS_TIME
to work around another problem).
What is the expected correct behavior?
That GitLab survives restarts, expected or unexpected. A minute or so of the usual 502 is less than ideal but OK.
Relevant logs and/or screenshots
Log snippets already posted above.
Output of checks
I'm not entirely sure what this means, but the issue can be reproduced with a 100% fresh installation of GitLab that I'll assume is OK. You'll see in my environment info that LDAP is enabled but this happens without LDAP as well.
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.4.4p296 Gem Version: 2.7.6 Bundler Version:1.16.2 Rake Version: 12.3.1 Redis Version: 3.2.11 Git Version: 2.17.1 Sidekiq Version:5.1.3 Go Version: unknownGitLab information Version: 11.1.4-ee Revision: d17962f Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql DB Version: 9.6.8 URL: [redacted] HTTP Clone URL: [redacted]/some-group/some-project.git SSH Clone URL: git@[redacted]:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: yes Using Omniauth: no
GitLab Shell Version: 7.1.4 Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab Shell ...GitLab Shell version >= 7.1.4 ? ... OK (7.1.4) Repo base directory exists? default... yes Repo storage directories are symlinks? default... no Repo paths owned by git:root, or git:git? default... yes Repo paths access is drwxrws---? default... yes hooks directories in repos are links: ... 1/1 ... ok 1/2 ... ok 1/3 ... ok 1/4 ... ok 1/5 ... ok 1/6 ... ok 1/7 ... ok 1/8 ... ok 1/9 ... ok 1/10 ... ok 1/11 ... ok 1/12 ... ok 1/13 ... ok 1/14 ... ok 1/15 ... ok 1/16 ... ok 1/17 ... ok 1/18 ... ok 1/19 ... ok 1/20 ... ok 1/21 ... ok 1/22 ... ok 1/23 ... ok 1/24 ... ok 1/25 ... ok 1/26 ... ok 1/27 ... ok 1/28 ... ok 1/29 ... ok 1/30 ... ok 1/31 ... ok 1/32 ... repository is empty 1/33 ... ok 1/34 ... ok 1/35 ... ok 1/36 ... ok 1/37 ... ok 1/38 ... ok 1/39 ... ok 1/40 ... ok 1/41 ... ok 1/42 ... ok 1/43 ... ok 1/44 ... ok 1/45 ... ok 1/46 ... repository is empty 1/47 ... ok 1/48 ... ok 1/49 ... ok 1/50 ... ok 1/51 ... ok 1/52 ... ok 1/53 ... ok 1/54 ... ok 1/55 ... repository is empty 1/56 ... ok 1/57 ... ok 1/58 ... ok 1/59 ... ok 1/60 ... ok 1/61 ... ok 1/62 ... ok 1/63 ... ok 1/64 ... ok 1/65 ... ok 1/66 ... ok 1/67 ... ok 1/68 ... ok 1/69 ... ok 1/70 ... ok 1/71 ... ok 1/72 ... ok 1/73 ... ok 1/74 ... ok 1/75 ... ok 1/76 ... ok 1/77 ... ok 1/78 ... ok 1/79 ... ok 1/80 ... ok 1/81 ... ok 1/82 ... ok 1/83 ... ok 1/84 ... ok 1/85 ... ok 1/86 ... ok 1/87 ... ok 1/88 ... ok 1/89 ... ok 1/90 ... ok 1/91 ... ok 1/92 ... ok 1/93 ... ok 1/94 ... ok Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Reply by email is disabled in config/gitlab.yml Checking LDAP ...
Server: ldapmain not verifying SSL hostname of LDAPS server [redacted] LDAP authentication... Success LDAP users with access to your GitLab server (only showing the first 100 results) [list of LDAP users redacted]
Checking LDAP ... Finished
Checking GitLab ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 1/1 ... yes 1/2 ... yes 1/3 ... yes 1/4 ... yes 1/5 ... yes 1/6 ... yes 1/7 ... yes 1/8 ... yes 1/9 ... yes 1/10 ... yes 1/11 ... yes 1/12 ... yes 1/13 ... yes 1/14 ... yes 1/15 ... yes 1/16 ... yes 1/17 ... yes 1/18 ... yes 1/19 ... yes 1/20 ... yes 1/21 ... yes 1/22 ... yes 1/23 ... yes 1/24 ... yes 1/25 ... yes 1/26 ... yes 1/27 ... yes 1/28 ... yes 1/29 ... yes 1/30 ... yes 1/31 ... yes 1/32 ... yes 1/33 ... yes 1/34 ... yes 1/35 ... yes 1/36 ... yes 1/37 ... yes 1/38 ... yes 1/39 ... yes 1/40 ... yes 1/41 ... yes 1/42 ... yes 1/43 ... yes 1/44 ... yes 1/45 ... yes 1/46 ... yes 1/47 ... yes 1/48 ... yes 1/49 ... yes 1/50 ... yes 1/51 ... yes 1/52 ... yes 1/53 ... yes 1/54 ... yes 1/55 ... yes 1/56 ... yes 1/57 ... yes 1/58 ... yes 1/59 ... yes 1/60 ... yes 1/61 ... yes 1/62 ... yes 1/63 ... yes 1/64 ... yes 1/65 ... yes 1/66 ... yes 1/67 ... yes 1/68 ... yes 1/69 ... yes 1/70 ... yes 1/71 ... yes 1/72 ... yes 1/73 ... yes 1/74 ... yes 1/75 ... yes 1/76 ... yes 1/77 ... yes 1/78 ... yes 1/79 ... yes 1/80 ... yes 1/81 ... yes 1/82 ... yes 1/83 ... yes 1/84 ... yes 1/85 ... yes 1/86 ... yes 1/87 ... yes 1/88 ... yes 1/89 ... yes 1/90 ... yes 1/91 ... yes 1/92 ... yes 1/93 ... yes 1/94 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.3.5 ? ... yes (2.4.4) Git version >= 2.9.5 ? ... yes (2.17.1) Git user has default SSH configuration? ... yes Active users: ... [redacted] Elasticsearch version 5.1 - 5.5? ... skipped (elasticsearch is disabled)
Checking GitLab ... Finished
Possible fixes
None that I know of other than the workarounds I mentioned above.