`gitlab-ctl reconfigure` Hangs When Invoked Via systemd (CentOS 7.5 latest)
Summary
gitlab-ctl reconfigure
hangs when invoked via systemd. Appears to effect all 11.x versions tested (11.0.0, 11.3.5 and 11.4.5)
Steps to reproduce
On a hardened CentOS 7 AWS instance, use CloudFormation's AWS::CloudFormation::Init
to install and configure GitLab 11.x
Example Project
Methods used in cfn-gitlab worked for GitLab 9.2.x through 10.8.7 when building on STIG-hardened Red Hat 7 or CentOS 7 instances launched from STIG-partitioned AMIs
What is the current bug behavior?
When executed from systemd — either via AWS::CloudFormation::Init
(as noted above) or as a "oneshot" invocation &mash; on reboot (a last-ditch diagnostic verification I tried while trying to isolate faults), the gitlab-ctl reconigure
routine hangs at:
Recipe: runit::systemd
* directory[/usr/lib/systemd/system] action create (up to date)
* cookbook_file[/usr/lib/systemd/system/gitlab-runsvdir.service] action create
- create new file /usr/lib/systemd/system/gitlab-runsvdir.service
- update content in file /usr/lib/systemd/system/gitlab-runsvdir.service from none to 6ca59d
--- /usr/lib/systemd/system/gitlab-runsvdir.service 2018-11-15 22:43:44.832396526 +0000
+++ /usr/lib/systemd/system/.chef-gitlab-runsvdir20181115-1461-ro35uc.service 2018-11-15 22:43:44.832396526 +0000
@@ -1 +1,11 @@
+[Unit]
+Description=GitLab Runit supervision process
+After=multi-user.target
+
+[Service]
+ExecStart=/opt/gitlab/embedded/bin/runsvdir-start
+Restart=always
+
+[Install]
+WantedBy=multi-user.target
- change mode from '' to '0644'
- restore selinux security context
* execute[systemctl daemon-reload] action run
- execute systemctl daemon-reload
* execute[systemctl enable gitlab-runsvdir] action run
[execute] Created symlink from /etc/systemd/system/multi-user.target.wants/gitlab-runsvdir.service to /usr/lib/systemd/system/gitlab-runsvdir.service.
- execute systemctl enable gitlab-runsvdir
* execute[systemctl start gitlab-runsvdir] action run
No other meaningful events are logged. Addtionlly, the systemd unit shows as exited/dead.
Further:
- attempting to restart the systemd unit hangs
- attempting to manually re-run
gitlab-ctl reconfigure
fails - ether before or after a reboot - removing the gitlab-ce RPM and reinstalling doesn't help - either with or without rebooting between the initial hang and the remove/reinstall of the gitlab-ce RPM
Basically, the reconfigure hangs, the system is not recoverable: one has to launch a new system and start fresh.
That said:
- If one takes a fresh system and invokes the scripts (from an interactive shell) that would normally be invoked via systemd, everything works fine
- If one wraps the scripts in a systemd oneshot unit, then invokes the unit via systemctl, the scripts work fine
- If attempting to simulate the type of TTY-less process-environment that systemd sets for services it invokes by doing
setsid bash -c 'env -i <wrapper_script>'
thegitlab-ctl reconfigure
also succeeds
Hang behavior seems to be wholly constrained to use-case where there is no interactive shell is involved in directly or indirectly invoking the utility.
What is the expected correct behavior?
My automated provisioning scripts function under GitLab 11.x as they did under 9.2.x up through 10.8.7
Relevant logs and/or screenshots
See above
Output of checks
N/A
Results of GitLab environment info
# gitlab-rake gitlab:env:info
System information
System:
Current User: git
Using RVM: no
Ruby Version: 2.4.5p335
Gem Version: 2.7.6
Bundler Version:1.16.2
Rake Version: 12.3.1
Redis Version: 3.2.12
Git Version: 2.18.1
Sidekiq Version:5.2.1
Go Version: unknown
GitLab information
Version: 11.4.5
Revision: f5536c6
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: postgresql
URL: https://gitlab.dev.lab
HTTP Clone URL: https://gitlab.dev.lab/some-group/some-project.git
SSH Clone URL: git@gitlab.dev.lab:some-group/some-project.git
Using LDAP: yes
Using Omniauth: yes
Omniauth Providers:
GitLab Shell
Version: 8.3.3
Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks
Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
After the hang condition happens, the application-check fails as follows:
# gitlab-rake gitlab:check SANITIZE=true
Checking GitLab Shell ...
GitLab Shell version >= 8.3.3 ? ... OK (8.3.3)
Repo base directory exists?
default... yes
Repo storage directories are symlinks?
default... no
Repo paths owned by git:root, or git:git?
default... yes
Repo paths access is drwxrws---?
default... yes
hooks directories in repos are links: ...
13/20 ... rake aborted!
Errno::ENOENT: No such file or directory - connect(2) for /var/opt/gitlab/redis/redis.socket
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache.rb:30:in `read'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache.rb:38:in `fetch_without_caching_false'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache_adapter.rb:112:in `block (2 levels) in cache_method_output_asymmetrically'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/null_request_store.rb:34:in `fetch'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/safe_request_store.rb:12:in `fetch'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache.rb:22:in `fetch'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache_adapter.rb:111:in `block in cache_method_output_asymmetrically'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/utils/strong_memoize.rb:26:in `strong_memoize'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache_adapter.rb:125:in `block in memoize_method_output'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache_adapter.rb:134:in `no_repository_fallback'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache_adapter.rb:124:in `memoize_method_output'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache_adapter.rb:110:in `cache_method_output_asymmetrically'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/repository_cache_adapter.rb:38:in `block in cache_method_asymmetrically'
/opt/gitlab/embedded/service/gitlab-rails/app/models/repository.rb:519:in `empty?'
/opt/gitlab/embedded/service/gitlab-rails/app/models/project.rb:563:in `empty_repo?'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/check.rake:188:in `block in check_repos_hooks_directory_is_link'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/check.rake:184:in `check_repos_hooks_directory_is_link'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/check.rake:52:in `block (4 levels) in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/storage_settings.rb:29:in `block in allow_disk_access'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/temporarily_allow.rb:7:in `temporarily_allow'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/storage_settings.rb:29:in `allow_disk_access'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/check.rake:47:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => gitlab:check => gitlab:gitlab_shell:check
(See full trace by running task with --trace)
Which, given that the service dies and can't be restarted (meaning no REDIS socket), the check-abort makes sense
Possible fixes
Can suggest none.