Gitaly fails to restart on Omnibus upgrade from 14.2.4 to 14.3.0
<!---
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by the "regression" or "bug" label:
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=bug
and verify the issue you're about to submit isn't a duplicate.
--->
### Summary
<!-- Summarize the bug encountered concisely. -->
During an Omnibus upgrade from 14.2.4 to 14.3.0 (EE), Gitaly fails to restart during the reconfigure. This left our Gitlab installation broken until services were manually restarted after the upgrade.
### Steps to reproduce
<!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. -->
Upgrade from 14.2.4 to 14.3.0.
### Example Project
<!-- If possible, please create an example project here on GitLab.com that exhibits the problematic
behavior, and link to it here in the bug report. If you are using an older version of GitLab, this
will also determine whether the bug is fixed in a more recent version. -->
N/A
### What is the current *bug* behavior?
<!-- Describe what actually happens. -->
During RPM installation, Gitlab is reconfigured and fails to restart Gitaly after upgrade.
### What is the expected *correct* behavior?
<!-- Describe what you should see instead. -->
Upgrade should work without issue, and all affected services restarted as necessary.
### Relevant logs and/or screenshots
<!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code
as it's tough to read otherwise. -->
<details>
<summary>Relevant section from reconfigure output</summary>
<pre>
Recipe: gitaly::enable
* runit_service[gitaly] action restart
================================================================================
Error executing action `restart` on resource 'runit_service[gitaly]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ----
STDOUT: timeout: run: /opt/gitlab/service/gitaly: (pid 3026) 432112s, got TERM
STDERR:
---- End output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ----
Ran /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly returned 1
Cookbook Trace:
---------------
/opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/helpers.rb:136:in `tap'
/opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/helpers.rb:136:in `safe_sv_shellout!'
/opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/helpers.rb:164:in `restart_service'
/opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/provider_runit_service.rb:357:in `block in <class:RunitService>'
Resource Declaration:
---------------------
# In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitaly/recipes/enable.rb
103: runit_service 'gitaly' do
104: start_down node['gitaly']['ha']
105: options({
106: user: account_helper.gitlab_user,
107: groupname: account_helper.gitlab_group,
108: working_dir: working_dir,
109: env_dir: env_directory,
110: bin_path: gitaly_path,
111: wrapper_path: wrapper_path,
112: config_path: config_path,
113: log_directory: log_directory,
114: json_logging: json_logging,
115: open_files_ulimit: open_files_ulimit
116: }.merge(params))
117: log_options node['gitlab']['logging'].to_hash.merge(node['gitaly'].to_hash)
118: end
119:
Compiled Resource:
------------------
# Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitaly/recipes/enable.rb:103:in `from_file'
runit_service("gitaly") do
action [:enable]
default_guard_interpreter :default
declared_type :runit_service
cookbook_name "gitaly"
recipe_name "enable"
start_down false
service_name "gitaly"
options {:env_dir=>"/opt/gitlab/etc/gitaly/env", :user=>"git", :groupname=>"git", :working_dir=>"/var/opt/gitlab/gitaly", :bin_path=>"/opt/gitlab/embedded/bin/gitaly", :wrapper_path=>"/opt/gitlab/embedded/bin/gitaly-wrapper", :config_path=>"/var/opt/gitlab/gitaly/config.toml", :log_directory=>"/var/log/gitlab/gitaly", :json_logging=>true, :open_files_ulimit=>15000}
log_options {"svlogd_size"=>209715200, "svlogd_num"=>30, "svlogd_timeout"=>86400, "svlogd_filter"=>"gzip", "svlogd_udp"=>nil, "svlogd_prefix"=>nil, "udp_log_shipping_host"=>nil, "udp_log_shipping_hostname"=>nil, "udp_log_shipping_port"=>514, "logrotate_frequency"=>"daily", "logrotate_maxsize"=>nil, "logrotate_size"=>nil, "logrotate_rotate"=>30, "logrotate_compress"=>"compress", "logrotate_method"=>"copytruncate", "logrotate_postrotate"=>nil, "logrotate_dateformat"=>nil, "enable"=>true, "ha"=>false, "dir"=>"/var/opt/gitlab/gitaly", "internal_socket_dir"=>"/var/opt/gitlab/gitaly/internal_sockets", "log_directory"=>"/var/log/gitlab/gitaly", "env_directory"=>"/opt/gitlab/etc/gitaly/env", "graceful_restart_timeout"=>nil, "bin_path"=>"/opt/gitlab/embedded/bin/gitaly", "socket_path"=>"/var/opt/gitlab/gitaly/gitaly.socket", "listen_addr"=>nil, "tls_listen_addr"=>nil, "certificate_path"=>nil, "key_path"=>nil, "prometheus_listen_addr"=>"localhost:9236", "logging_level"=>nil, "logging_format"=>"json", "logging_sentry_dsn"=>nil, "logging_ruby_sentry_dsn"=>nil, "logging_sentry_environment"=>nil, "prometheus_grpc_latency_buckets"=>nil, "storage"=>[{"name"=>"default", "path"=>"/var/opt/gitlab/git-data/repositories"}], "auth_token"=>nil, "auth_transitioning"=>false, "git_catfile_cache_size"=>nil, "git_bin_path"=>nil, "open_files_ulimit"=>15000, "ruby_max_rss"=>nil, "ruby_graceful_restart_timeout"=>nil, "ruby_restart_delay"=>nil, "ruby_num_workers"=>nil, "ruby_rugged_git_config_search_path"=>"/opt/gitlab/embedded/etc", "concurrency"=>nil, "daily_maintenance_start_hour"=>nil, "daily_maintenance_start_minute"=>nil, "daily_maintenance_duration"=>nil, "daily_maintenance_storages"=>nil, "daily_maintenance_disabled"=>false, "cgroups_count"=>nil, "cgroups_mountpoint"=>nil, "cgroups_hierarchy_root"=>nil, "cgroups_memory_enabled"=>nil, "cgroups_memory_limit"=>nil, "cgroups_cpu_enabled"=>nil, "cgroups_cpu_shares"=>nil, "pack_objects_cache_enabled"=>false, "pack_objects_cache_dir"=>nil, "pack_objects_cache_max_age"=>nil, "consul_service_name"=>"gitaly", "custom_hooks_dir"=>nil, "consul_service_meta"=>nil, "env"=>{"HOME"=>"/var/opt/gitlab", "PATH"=>"/opt/gitlab/bin:/opt/gitlab/embedded/bin:/bin:/usr/bin", "TZ"=>":/etc/localtime", "PYTHONPATH"=>"/opt/gitlab/embedded/lib/python3.9/site-packages", "ICU_DATA"=>"/opt/gitlab/embedded/share/icu/current", "SSL_CERT_DIR"=>"/opt/gitlab/embedded/ssl/certs/", "GITALY_PID_FILE"=>"/var/opt/gitlab/gitaly/gitaly.pid", "WRAPPER_JSON_LOGGING"=>"true"}}
run_template_name "gitaly"
log_template_name "gitaly"
check_script_template_name "gitaly"
finish_script_template_name "gitaly"
control_template_names {}
end
System Info:
------------
chef_version=15.17.4
platform=centos
platform_version=7.7.1908
ruby=ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [x86_64-linux]
program_name=/opt/gitlab/embedded/bin/chef-client
executable=/opt/gitlab/embedded/bin/chef-client
* runit_service[gitaly] action hup
- send hup to runit_service[gitaly]
Recipe: gitlab::gitlab-workhorse
* runit_service[gitlab-workhorse] action restart (up to date)
Recipe: gitlab-pages::enable
* runit_service[gitlab-pages] action restart (up to date)
Recipe: registry::enable
* runit_service[registry] action restart (up to date)
Recipe: monitoring::gitlab-exporter
* runit_service[gitlab-exporter] action restart (up to date)
Running handlers:
There was an error running gitlab-ctl reconfigure:
runit_service[gitaly] (gitaly::enable line 103) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ----
STDOUT: timeout: run: /opt/gitlab/service/gitaly: (pid 3026) 432112s, got TERM
STDERR:
---- End output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ----
Ran /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly returned 1
</pre>
</details>
<details>
<summary>Relevant Gitaly logs</summary>
<pre>
{"gitaly":2961,"level":"warning","msg":"forwarding signal","signal":18,"time":"2021-09-23T05:09:26.088Z","wrapper":3026}
{"gitaly":2961,"level":"warning","msg":"forwarding signal","signal":1,"time":"2021-09-23T05:09:56.758Z","wrapper":3026}
time="2021-09-23T05:09:57+01:00" level=info msg="Starting GitalyversionGitaly, version 14.3.0"
time="2021-09-23T05:09:57+01:00" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/opt/gitlab/embedded/bin/git
{"latencies":[0.001,0.005,0.025,0.1,0.5,1,10,30,60,300,1500],"level":"info","msg":"grpc prometheus histograms enabled","time":"2021-09-23T05:09:57.047Z"}
{"level":"info","msg":"License database preloaded","time":"2021-09-23T05:09:59.992Z"}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2021-09-23T05:09:59.992Z","time_ms":0}
{"level":"info","msg":"clearing disk cache object folder","storage":"default","time":"2021-09-23T05:09:59.992Z"}
{"level":"info","msg":"moving disk cache object folder to /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache520220024","storage":"default","time":"2021-09-23T05:09:59.993Z"}
{"level":"info","msg":"Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/cache","path":"/var/opt/gitlab/git-data/repositories/+gitaly/cache","time":"2021-09-23T05:09:59.995Z"}
{"level":"info","msg":"Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/state","path":"/var/opt/gitlab/git-data/repositories/+gitaly/state","time":"2021-09-23T05:09:59.995Z"}
{"level":"info","msg":"cleared all cache object files in /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache520220024 after 725.022µs","storage":"default","time":"2021-09-23T05:09:59.995Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.1","supervisor.rss_threshold":209715200,"time":"2021-09-23T05:10:00.371Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.0","supervisor.rss_threshold":209715200,"time":"2021-09-23T05:10:00.371Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","supervisor.pid":23207,"
time":"2021-09-23T05:10:00.371Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","supervisor.pid":23208,"
time":"2021-09-23T05:10:00.372Z"}
{"address":"/var/opt/gitlab/gitaly/gitaly.socket","level":"info","msg":"listening at unix address","time":"2021-09-23T05:10:00.378Z"}
{"address":"/var/opt/gitlab/gitaly/internal_sockets/internal_23116.sock","level":"info","msg":"listening at unix address","time":"2021-09-23T05:10:00.379Z"}
{"address":"localhost:9236","level":"info","msg":"starting prometheus listener","time":"2021-09-23T05:10:00.380Z"}
{"level":"info","msg":"maintenance: daily scheduled","pid":23116,"scheduled":"2021-09-23T12:00:00+01:00","time":"2021-09-23T05:10:00.380Z"}
{"level":"info","msg":"Upgrade succeeded","time":"2021-09-23T05:10:00.380Z"}
{"graceful_timeout":60000000000,"level":"warning","msg":"starting grace period","time":"2021-09-23T05:10:00.380Z"}
{"level":"warning","msg":"[core] grpc: addrConn.createTransport failed to connect to {unix:/var/opt/gitlab/gitaly/internal_sockets/internal_2961.sock localhost \u003cnil\u003e 0 \u003cnil\u003e}. Err: connection error: desc = \"transport: Error while dialing dial unix /var/opt/gitlab
/gitaly/internal_sockets/internal_2961.sock: connect: no such file or directory\". Reconnecting...","pid":2961,"system":"system","time":"2021-09-23T05:10:00.381Z"}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","2961","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","
time":"2021-09-23T05:10:00.391Z"}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","2961","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","
time":"2021-09-23T05:10:00.402Z"}
{"error":"graceful upgrade: completed","level":"error","msg":"shutting down","time":"2021-09-23T05:10:00.403Z"}
{"level":"info","msg":"Gitaly stopped","time":"2021-09-23T05:10:00.403Z"}
{"level":"info","msg":"PID 23207 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby
.0"],"supervisor.name":"gitaly-ruby.0","time":"2021-09-23T05:10:00.650Z"}
{"level":"info","msg":"PID 23208 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby
.1"],"supervisor.name":"gitaly-ruby.1","time":"2021-09-23T05:10:00.650Z"}
{"gitaly":2961,"level":"error","msg":"wrapper for gitaly shutting down","time":"2021-09-23T05:10:01.206Z","wrapper":3026}
{"level":"info","msg":"Wrapper started","time":"2021-09-23T05:10:01.229Z","wrapper":23215}
{"level":"info","msg":"finding gitaly","pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2021-09-23T05:10:01.229Z","wrapper":23215}
{"level":"info","msg":"adopting a process","time":"2021-09-23T05:10:01.239Z","wrapper":23215}
{"gitaly":23116,"level":"info","msg":"monitoring gitaly","time":"2021-09-23T05:10:01.239Z","wrapper":23215}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2021-09-23T06:10:00.018Z","time_ms":16}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2021-09-23T07:10:00.019Z","time_ms":0}
{"gitaly":23116,"level":"warning","msg":"forwarding signal","signal":15,"time":"2021-09-23T07:34:24.155Z","wrapper":23215}
{"gitaly":23116,"level":"warning","msg":"forwarding signal","signal":18,"time":"2021-09-23T07:34:24.158Z","wrapper":23215}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","time":"2021-09-23T07:34:24.164Z"}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","time":"2021-09-23T07:34:24.170Z"}
{"level":"warning","msg":"[transport] transport: http2Server.HandleStreams failed to read frame: read unix /var/opt/gitlab/gitaly/internal_sockets/internal_23116.sock-\u003e@: use of closed network connection","pid":23116,"system":"system","time":"2021-09-23T07:34:24.171Z"}
{"error":"received signal \"terminated\"","level":"error","msg":"shutting down","time":"2021-09-23T07:34:24.171Z"}
{"level":"info","msg":"Gitaly stopped","time":"2021-09-23T07:34:24.171Z"}
{"gitaly":23116,"level":"error","msg":"wrapper for gitaly shutting down","time":"2021-09-23T07:34:24.518Z","wrapper":23215}
{"level":"info","msg":"Wrapper started","time":"2021-09-23T07:34:24.523Z","wrapper":28241}
</pre>
</details>
In the Gitaly logs, I believe the most relevant line is:
```
{"level":"warning","msg":"[transport] transport: http2Server.HandleStreams failed to read frame: read unix /var/opt/gitlab/gitaly/internal_sockets/internal_23116.sock-\u003e@: use of closed network connection","pid":23116,"system":"system","time":"2021-09-23T07:34:24.171Z"}
```
### Output of checks
<!-- If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com -->
#### Results of GitLab environment info
<!-- Input any relevant GitLab environment information if needed. -->
<details>
<summary>Expand for output related to GitLab environment info</summary>
<pre>
System information
System:
Proxy: no
Current User: git
Using RVM: no
Ruby Version: 2.7.4p191
Gem Version: 3.1.4
Bundler Version:2.1.4
Rake Version: 13.0.6
Redis Version: 6.0.14
Git Version: 2.33.0.
Sidekiq Version:5.2.9
Go Version: unknown
GitLab information
Version: 14.3.0-ee
Revision: dec73e99fdd
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: PostgreSQL
DB Version: 12.7
URL: https://redacted.com
HTTP Clone URL: https://redacted.com/some-group/some-project.git
SSH Clone URL: git@redacted.com:some-group/some-project.git
Elasticsearch: yes
Geo: no
Using LDAP: no
Using Omniauth: yes
Omniauth Providers: openid_connect
GitLab Shell
Version: 13.21.0
Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Git: /opt/gitlab/embedded/bin/git
</pre>
</details>
#### Results of GitLab application Check
<!-- Input any relevant GitLab application check information if needed. -->
<details>
<summary>Expand for output related to the GitLab application check</summary>
<pre>
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 13.21.0 ? ... OK (13.21.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes
Number of Sidekiq processes (cluster/worker) ... 1/1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Checking Reply by email ...
IMAP server credentials are correct? ... Checking gitlab@redacted.com
yes
Init.d configured correctly? ... skipped
MailRoom running? ... skipped
Checking Reply by email ... Finished
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... yes
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ...
... snipped 1900 lines - all were 'yes' ...
Redis version >= 5.0.0? ... yes
Ruby version >= 2.7.2 ? ... yes (2.7.4)
Git version >= 2.31.0 ? ... yes (2.33.0)
Git user has default SSH configuration? ... yes
Active users: ... 186
Is authorized keys file accessible? ... yes
GitLab configured to store new projects in hashed storage? ... yes
All projects are in hashed storage? ... yes
Elasticsearch version 7.x (6.4 - 6.x deprecated to be removed in 13.8)? ... yes (6.8.4)
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
</pre>
</details>
### Possible fixes
<!-- If you can, link to the line of code that might be responsible for the problem. -->
issue