Gitaly fails to restart on Omnibus upgrade from 14.2.4 to 14.3.0
<!--- Please read this! Before opening a new issue, make sure to search for keywords in the issues filtered by the "regression" or "bug" label: - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=bug and verify the issue you're about to submit isn't a duplicate. ---> ### Summary <!-- Summarize the bug encountered concisely. --> During an Omnibus upgrade from 14.2.4 to 14.3.0 (EE), Gitaly fails to restart during the reconfigure. This left our Gitlab installation broken until services were manually restarted after the upgrade. ### Steps to reproduce <!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. --> Upgrade from 14.2.4 to 14.3.0. ### Example Project <!-- If possible, please create an example project here on GitLab.com that exhibits the problematic behavior, and link to it here in the bug report. If you are using an older version of GitLab, this will also determine whether the bug is fixed in a more recent version. --> N/A ### What is the current *bug* behavior? <!-- Describe what actually happens. --> During RPM installation, Gitlab is reconfigured and fails to restart Gitaly after upgrade. ### What is the expected *correct* behavior? <!-- Describe what you should see instead. --> Upgrade should work without issue, and all affected services restarted as necessary. ### Relevant logs and/or screenshots <!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's tough to read otherwise. --> <details> <summary>Relevant section from reconfigure output</summary> <pre> Recipe: gitaly::enable * runit_service[gitaly] action restart ================================================================================ Error executing action `restart` on resource 'runit_service[gitaly]' ================================================================================ Mixlib::ShellOut::ShellCommandFailed ------------------------------------ Expected process to exit with [0], but received '1' ---- Begin output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ---- STDOUT: timeout: run: /opt/gitlab/service/gitaly: (pid 3026) 432112s, got TERM STDERR: ---- End output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ---- Ran /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly returned 1 Cookbook Trace: --------------- /opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/helpers.rb:136:in `tap' /opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/helpers.rb:136:in `safe_sv_shellout!' /opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/helpers.rb:164:in `restart_service' /opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/libraries/provider_runit_service.rb:357:in `block in <class:RunitService>' Resource Declaration: --------------------- # In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitaly/recipes/enable.rb 103: runit_service 'gitaly' do 104: start_down node['gitaly']['ha'] 105: options({ 106: user: account_helper.gitlab_user, 107: groupname: account_helper.gitlab_group, 108: working_dir: working_dir, 109: env_dir: env_directory, 110: bin_path: gitaly_path, 111: wrapper_path: wrapper_path, 112: config_path: config_path, 113: log_directory: log_directory, 114: json_logging: json_logging, 115: open_files_ulimit: open_files_ulimit 116: }.merge(params)) 117: log_options node['gitlab']['logging'].to_hash.merge(node['gitaly'].to_hash) 118: end 119: Compiled Resource: ------------------ # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitaly/recipes/enable.rb:103:in `from_file' runit_service("gitaly") do action [:enable] default_guard_interpreter :default declared_type :runit_service cookbook_name "gitaly" recipe_name "enable" start_down false service_name "gitaly" options {:env_dir=>"/opt/gitlab/etc/gitaly/env", :user=>"git", :groupname=>"git", :working_dir=>"/var/opt/gitlab/gitaly", :bin_path=>"/opt/gitlab/embedded/bin/gitaly", :wrapper_path=>"/opt/gitlab/embedded/bin/gitaly-wrapper", :config_path=>"/var/opt/gitlab/gitaly/config.toml", :log_directory=>"/var/log/gitlab/gitaly", :json_logging=>true, :open_files_ulimit=>15000} log_options {"svlogd_size"=>209715200, "svlogd_num"=>30, "svlogd_timeout"=>86400, "svlogd_filter"=>"gzip", "svlogd_udp"=>nil, "svlogd_prefix"=>nil, "udp_log_shipping_host"=>nil, "udp_log_shipping_hostname"=>nil, "udp_log_shipping_port"=>514, "logrotate_frequency"=>"daily", "logrotate_maxsize"=>nil, "logrotate_size"=>nil, "logrotate_rotate"=>30, "logrotate_compress"=>"compress", "logrotate_method"=>"copytruncate", "logrotate_postrotate"=>nil, "logrotate_dateformat"=>nil, "enable"=>true, "ha"=>false, "dir"=>"/var/opt/gitlab/gitaly", "internal_socket_dir"=>"/var/opt/gitlab/gitaly/internal_sockets", "log_directory"=>"/var/log/gitlab/gitaly", "env_directory"=>"/opt/gitlab/etc/gitaly/env", "graceful_restart_timeout"=>nil, "bin_path"=>"/opt/gitlab/embedded/bin/gitaly", "socket_path"=>"/var/opt/gitlab/gitaly/gitaly.socket", "listen_addr"=>nil, "tls_listen_addr"=>nil, "certificate_path"=>nil, "key_path"=>nil, "prometheus_listen_addr"=>"localhost:9236", "logging_level"=>nil, "logging_format"=>"json", "logging_sentry_dsn"=>nil, "logging_ruby_sentry_dsn"=>nil, "logging_sentry_environment"=>nil, "prometheus_grpc_latency_buckets"=>nil, "storage"=>[{"name"=>"default", "path"=>"/var/opt/gitlab/git-data/repositories"}], "auth_token"=>nil, "auth_transitioning"=>false, "git_catfile_cache_size"=>nil, "git_bin_path"=>nil, "open_files_ulimit"=>15000, "ruby_max_rss"=>nil, "ruby_graceful_restart_timeout"=>nil, "ruby_restart_delay"=>nil, "ruby_num_workers"=>nil, "ruby_rugged_git_config_search_path"=>"/opt/gitlab/embedded/etc", "concurrency"=>nil, "daily_maintenance_start_hour"=>nil, "daily_maintenance_start_minute"=>nil, "daily_maintenance_duration"=>nil, "daily_maintenance_storages"=>nil, "daily_maintenance_disabled"=>false, "cgroups_count"=>nil, "cgroups_mountpoint"=>nil, "cgroups_hierarchy_root"=>nil, "cgroups_memory_enabled"=>nil, "cgroups_memory_limit"=>nil, "cgroups_cpu_enabled"=>nil, "cgroups_cpu_shares"=>nil, "pack_objects_cache_enabled"=>false, "pack_objects_cache_dir"=>nil, "pack_objects_cache_max_age"=>nil, "consul_service_name"=>"gitaly", "custom_hooks_dir"=>nil, "consul_service_meta"=>nil, "env"=>{"HOME"=>"/var/opt/gitlab", "PATH"=>"/opt/gitlab/bin:/opt/gitlab/embedded/bin:/bin:/usr/bin", "TZ"=>":/etc/localtime", "PYTHONPATH"=>"/opt/gitlab/embedded/lib/python3.9/site-packages", "ICU_DATA"=>"/opt/gitlab/embedded/share/icu/current", "SSL_CERT_DIR"=>"/opt/gitlab/embedded/ssl/certs/", "GITALY_PID_FILE"=>"/var/opt/gitlab/gitaly/gitaly.pid", "WRAPPER_JSON_LOGGING"=>"true"}} run_template_name "gitaly" log_template_name "gitaly" check_script_template_name "gitaly" finish_script_template_name "gitaly" control_template_names {} end System Info: ------------ chef_version=15.17.4 platform=centos platform_version=7.7.1908 ruby=ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [x86_64-linux] program_name=/opt/gitlab/embedded/bin/chef-client executable=/opt/gitlab/embedded/bin/chef-client * runit_service[gitaly] action hup - send hup to runit_service[gitaly] Recipe: gitlab::gitlab-workhorse * runit_service[gitlab-workhorse] action restart (up to date) Recipe: gitlab-pages::enable * runit_service[gitlab-pages] action restart (up to date) Recipe: registry::enable * runit_service[registry] action restart (up to date) Recipe: monitoring::gitlab-exporter * runit_service[gitlab-exporter] action restart (up to date) Running handlers: There was an error running gitlab-ctl reconfigure: runit_service[gitaly] (gitaly::enable line 103) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ---- STDOUT: timeout: run: /opt/gitlab/service/gitaly: (pid 3026) 432112s, got TERM STDERR: ---- End output of /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly ---- Ran /opt/gitlab/embedded/bin/sv restart /opt/gitlab/service/gitaly returned 1 </pre> </details> <details> <summary>Relevant Gitaly logs</summary> <pre> {"gitaly":2961,"level":"warning","msg":"forwarding signal","signal":18,"time":"2021-09-23T05:09:26.088Z","wrapper":3026} {"gitaly":2961,"level":"warning","msg":"forwarding signal","signal":1,"time":"2021-09-23T05:09:56.758Z","wrapper":3026} time="2021-09-23T05:09:57+01:00" level=info msg="Starting GitalyversionGitaly, version 14.3.0" time="2021-09-23T05:09:57+01:00" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/opt/gitlab/embedded/bin/git {"latencies":[0.001,0.005,0.025,0.1,0.5,1,10,30,60,300,1500],"level":"info","msg":"grpc prometheus histograms enabled","time":"2021-09-23T05:09:57.047Z"} {"level":"info","msg":"License database preloaded","time":"2021-09-23T05:09:59.992Z"} {"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2021-09-23T05:09:59.992Z","time_ms":0} {"level":"info","msg":"clearing disk cache object folder","storage":"default","time":"2021-09-23T05:09:59.992Z"} {"level":"info","msg":"moving disk cache object folder to /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache520220024","storage":"default","time":"2021-09-23T05:09:59.993Z"} {"level":"info","msg":"Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/cache","path":"/var/opt/gitlab/git-data/repositories/+gitaly/cache","time":"2021-09-23T05:09:59.995Z"} {"level":"info","msg":"Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/state","path":"/var/opt/gitlab/git-data/repositories/+gitaly/state","time":"2021-09-23T05:09:59.995Z"} {"level":"info","msg":"cleared all cache object files in /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache520220024 after 725.022µs","storage":"default","time":"2021-09-23T05:09:59.995Z"} {"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.1","supervisor.rss_threshold":209715200,"time":"2021-09-23T05:10:00.371Z"} {"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.0","supervisor.rss_threshold":209715200,"time":"2021-09-23T05:10:00.371Z"} {"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","supervisor.pid":23207," time":"2021-09-23T05:10:00.371Z"} {"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","supervisor.pid":23208," time":"2021-09-23T05:10:00.372Z"} {"address":"/var/opt/gitlab/gitaly/gitaly.socket","level":"info","msg":"listening at unix address","time":"2021-09-23T05:10:00.378Z"} {"address":"/var/opt/gitlab/gitaly/internal_sockets/internal_23116.sock","level":"info","msg":"listening at unix address","time":"2021-09-23T05:10:00.379Z"} {"address":"localhost:9236","level":"info","msg":"starting prometheus listener","time":"2021-09-23T05:10:00.380Z"} {"level":"info","msg":"maintenance: daily scheduled","pid":23116,"scheduled":"2021-09-23T12:00:00+01:00","time":"2021-09-23T05:10:00.380Z"} {"level":"info","msg":"Upgrade succeeded","time":"2021-09-23T05:10:00.380Z"} {"graceful_timeout":60000000000,"level":"warning","msg":"starting grace period","time":"2021-09-23T05:10:00.380Z"} {"level":"warning","msg":"[core] grpc: addrConn.createTransport failed to connect to {unix:/var/opt/gitlab/gitaly/internal_sockets/internal_2961.sock localhost \u003cnil\u003e 0 \u003cnil\u003e}. Err: connection error: desc = \"transport: Error while dialing dial unix /var/opt/gitlab /gitaly/internal_sockets/internal_2961.sock: connect: no such file or directory\". Reconnecting...","pid":2961,"system":"system","time":"2021-09-23T05:10:00.381Z"} {"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","2961","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0"," time":"2021-09-23T05:10:00.391Z"} {"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","2961","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1"," time":"2021-09-23T05:10:00.402Z"} {"error":"graceful upgrade: completed","level":"error","msg":"shutting down","time":"2021-09-23T05:10:00.403Z"} {"level":"info","msg":"Gitaly stopped","time":"2021-09-23T05:10:00.403Z"} {"level":"info","msg":"PID 23207 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby .0"],"supervisor.name":"gitaly-ruby.0","time":"2021-09-23T05:10:00.650Z"} {"level":"info","msg":"PID 23208 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby .1"],"supervisor.name":"gitaly-ruby.1","time":"2021-09-23T05:10:00.650Z"} {"gitaly":2961,"level":"error","msg":"wrapper for gitaly shutting down","time":"2021-09-23T05:10:01.206Z","wrapper":3026} {"level":"info","msg":"Wrapper started","time":"2021-09-23T05:10:01.229Z","wrapper":23215} {"level":"info","msg":"finding gitaly","pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2021-09-23T05:10:01.229Z","wrapper":23215} {"level":"info","msg":"adopting a process","time":"2021-09-23T05:10:01.239Z","wrapper":23215} {"gitaly":23116,"level":"info","msg":"monitoring gitaly","time":"2021-09-23T05:10:01.239Z","wrapper":23215} {"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2021-09-23T06:10:00.018Z","time_ms":16} {"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2021-09-23T07:10:00.019Z","time_ms":0} {"gitaly":23116,"level":"warning","msg":"forwarding signal","signal":15,"time":"2021-09-23T07:34:24.155Z","wrapper":23215} {"gitaly":23116,"level":"warning","msg":"forwarding signal","signal":18,"time":"2021-09-23T07:34:24.158Z","wrapper":23215} {"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","time":"2021-09-23T07:34:24.164Z"} {"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","23116","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","time":"2021-09-23T07:34:24.170Z"} {"level":"warning","msg":"[transport] transport: http2Server.HandleStreams failed to read frame: read unix /var/opt/gitlab/gitaly/internal_sockets/internal_23116.sock-\u003e@: use of closed network connection","pid":23116,"system":"system","time":"2021-09-23T07:34:24.171Z"} {"error":"received signal \"terminated\"","level":"error","msg":"shutting down","time":"2021-09-23T07:34:24.171Z"} {"level":"info","msg":"Gitaly stopped","time":"2021-09-23T07:34:24.171Z"} {"gitaly":23116,"level":"error","msg":"wrapper for gitaly shutting down","time":"2021-09-23T07:34:24.518Z","wrapper":23215} {"level":"info","msg":"Wrapper started","time":"2021-09-23T07:34:24.523Z","wrapper":28241} </pre> </details> In the Gitaly logs, I believe the most relevant line is: ``` {"level":"warning","msg":"[transport] transport: http2Server.HandleStreams failed to read frame: read unix /var/opt/gitlab/gitaly/internal_sockets/internal_23116.sock-\u003e@: use of closed network connection","pid":23116,"system":"system","time":"2021-09-23T07:34:24.171Z"} ``` ### Output of checks <!-- If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com --> #### Results of GitLab environment info <!-- Input any relevant GitLab environment information if needed. --> <details> <summary>Expand for output related to GitLab environment info</summary> <pre> System information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.7.4p191 Gem Version: 3.1.4 Bundler Version:2.1.4 Rake Version: 13.0.6 Redis Version: 6.0.14 Git Version: 2.33.0. Sidekiq Version:5.2.9 Go Version: unknown GitLab information Version: 14.3.0-ee Revision: dec73e99fdd Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 12.7 URL: https://redacted.com HTTP Clone URL: https://redacted.com/some-group/some-project.git SSH Clone URL: git@redacted.com:some-group/some-project.git Elasticsearch: yes Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers: openid_connect GitLab Shell Version: 13.21.0 Repository storage paths: - default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git </pre> </details> #### Results of GitLab application Check <!-- Input any relevant GitLab application check information if needed. --> <details> <summary>Expand for output related to the GitLab application check</summary> <pre> Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 13.21.0 ? ... OK (13.21.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1 Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Checking Reply by email ... IMAP server credentials are correct? ... Checking gitlab@redacted.com yes Init.d configured correctly? ... skipped MailRoom running? ... skipped Checking Reply by email ... Finished Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... ... snipped 1900 lines - all were 'yes' ... Redis version >= 5.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.4) Git version >= 2.31.0 ? ... yes (2.33.0) Git user has default SSH configuration? ... yes Active users: ... 186 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Elasticsearch version 7.x (6.4 - 6.x deprecated to be removed in 13.8)? ... yes (6.8.4) Checking GitLab App ... Finished Checking GitLab subtasks ... Finished </pre> </details> ### Possible fixes <!-- If you can, link to the line of code that might be responsible for the problem. -->
issue