Error `14:failed to connect to all addresses` using omnibus-gitlab docker image - Gitaly bootstrap fails with `address already in use`
Summary
Gitaly sometimes fails to restart during the startup process of omnibus-gitlab docker containers, leading to QA tests failing with the error 14:failed to connect to all addresses
when making an API call to create a project.
More info
First noticed in elasticsearch tests: #197207 (closed)
But also found in other tests: https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608
The QA tests run against an omnibus-gitlab docker container. The latter set of tests use the default omnibus-gitlab configuration, and so does the former (but also sets ELASTIC_URL
).
Steps to reproduce
It happens merely by running the QA tests using gitlab-qa
, but unfortunately successful in reproducing it reliably. The jobs that failed passed on retry.
The command to run the test as in https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608 would be gitlab-qa Test::Instance::Image EE
What is the current bug behavior?
Gitaly bootstrap sometimes fails with address already in use
upon restart when using an omnibus-gitlab docker image.
What is the expected correct behavior?
Gitaly bootstrap succeeds.
Relevant logs and/or screenshots
The problem with gitaly seems to happen during the omnibus-gitlab startup process. It looks like it restarts all the services right at the end, including gitaly (from https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/416011673):
[0mRecipe: <Dynamically Defined Resource>[0m
* service[gitaly] action restart
[32m- restart service service[gitaly][0m
[0mRecipe: gitaly::enable[0m
* runit_service[gitaly] action hup
[32m- send hup to runit_service[gitaly][0m
Logs for reconfigure
show the timing:
[2020-01-27T13:46:36+00:00] INFO: env_dir[/opt/gitlab/etc/gitaly/env] sending restart action to service[gitaly] (delayed)
[2020-01-27T13:46:36+00:00] INFO: service[gitaly] restarted
[2020-01-27T13:46:36+00:00] INFO: template[Create Gitaly config.toml] sending hup action to runit_service[gitaly] (delayed)
[2020-01-27T13:46:36+00:00] INFO: runit_service[gitaly] signalled (HUP)
[2020-01-27T13:46:36+00:00] INFO: runit_service[gitaly] sent hup
And the corresponding gitaly logs look normal (AFAIK) right up until the end:
{"gitaly":434,"level":"warning","msg":"forwarding signal","signal":15,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"gitaly":434,"level":"warning","msg":"forwarding signal","signal":18,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","434","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","time":"2020-01-27T13:46:36Z"}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","434","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","time":"2020-01-27T13:46:36Z"}
{"error":"received signal \"terminated\"","level":"error","msg":"shutting down","time":"2020-01-27T13:46:36Z"}
{"gitaly":434,"level":"warning","msg":"forwarding signal","signal":17,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"error":"os: process already finished","gitaly":434,"level":"error","msg":"can't forward the signal","signal":17,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"gitaly":434,"level":"error","msg":"wrapper for gitaly shutting down","time":"2020-01-27T13:46:36Z","wrapper":427}
{"level":"info","msg":"Wrapper started","time":"2020-01-27T13:46:36Z","wrapper":1318}
{"level":"info","msg":"finding gitaly","pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2020-01-27T13:46:36Z","wrapper":1318}
{"level":"info","msg":"spawning a process","time":"2020-01-27T13:46:36Z","wrapper":1318}
time="2020-01-27T13:46:37Z" level=info msg="Starting Gitaly" version="Gitaly, version 1.83.0"
time="2020-01-27T13:46:37Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/opt/gitlab/embedded/bin/git
time="2020-01-27T13:46:37Z" level=info msg="clearing disk cache object folder" storage=default
time="2020-01-27T13:46:37Z" level=info msg="moving disk cache object folder to /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache073245658" storage=default
time="2020-01-27T13:46:37Z" level=info msg="disk cache object folder doesn't exist, no need to remove" storage=default
time="2020-01-27T13:46:37Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/cache" storage=default
time="2020-01-27T13:46:37Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/state" storage=default
{"address":"/var/opt/gitlab/gitaly/gitaly.socket","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:37Z"}
{"address":"/var/opt/gitlab/gitaly/internal_sockets/internal.sock","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:37Z"}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2020-01-27T13:46:37Z","time_ms":0}
{"level":"info","msg":"Wrapper started","time":"2020-01-27T13:46:38Z","wrapper":1356}
{"level":"info","msg":"finding gitaly","pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2020-01-27T13:46:38Z","wrapper":1356}
{"level":"info","msg":"spawning a process","time":"2020-01-27T13:46:38Z","wrapper":1356}
{"gitaly":1366,"level":"info","msg":"monitoring gitaly","time":"2020-01-27T13:46:38Z","wrapper":1356}
time="2020-01-27T13:46:38Z" level=info msg="Starting Gitaly" version="Gitaly, version 1.83.0"
time="2020-01-27T13:46:38Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/opt/gitlab/embedded/bin/git
time="2020-01-27T13:46:38Z" level=info msg="clearing disk cache object folder" storage=default
time="2020-01-27T13:46:38Z" level=info msg="moving disk cache object folder to /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache118588225" storage=default
time="2020-01-27T13:46:38Z" level=info msg="disk cache object folder doesn't exist, no need to remove" storage=default
time="2020-01-27T13:46:38Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/cache" storage=default
time="2020-01-27T13:46:38Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/state" storage=default
{"address":"localhost:9236","level":"info","msg":"starting prometheus listener","time":"2020-01-27T13:46:38Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.1","supervisor.rss_threshold":209715200,"time":"2020-01-27T13:46:38Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.0","supervisor.rss_threshold":209715200,"time":"2020-01-27T13:46:38Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","supervisor.pid":1375,"time":"2020-01-27T13:46:38Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","supervisor.pid":1376,"time":"2020-01-27T13:46:38Z"}
{"level":"info","msg":"PID 1376 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","time":"2020-01-27T13:46:39Z"}
{"level":"info","msg":"PID 1375 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","time":"2020-01-27T13:46:39Z"}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2020-01-27T13:46:40Z","time_ms":0}
{"address":"/var/opt/gitlab/gitaly/gitaly.socket","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:40Z"}
{"address":"/var/opt/gitlab/gitaly/internal_sockets/internal.sock","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:40Z"}
{"error":"unable to start the bootstrap: can't create new listener: listen tcp 127.0.0.1:9236: bind: address already in use","level":"error","msg":"shutting down","time":"2020-01-27T13:46:40Z"}
Here are all the logs from that job - elasticsearch_job.zip
And the logs from https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608 - ee_instance_job.zip
Output of checks
Results of GitLab environment info
From the omnibus-gitlab docker image used in https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608
Expand for output related to GitLab environment info
System information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.6.5p114 Gem Version: 2.7.10 Bundler Version:1.17.3 Rake Version: 12.3.3 Redis Version: 5.0.7 Git Version: 2.24.1 Sidekiq Version:5.2.7 Go Version: unknown GitLab information Version: 12.7.0-pre Revision: defea506b0f Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 10.9 URL: http://gitlab.test HTTP Clone URL: http://gitlab.test/some-group/some-project.git SSH Clone URL: git@gitlab.test:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 11.0.0 Repository storage paths: - default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 11.0.0 ? ... OK (11.0.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1 Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... can't check, you have no projects Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.5) Git version >= 2.22.0 ? ... yes (2.24.1) Git user has default SSH configuration? ... yes Active users: ... 1 Is authorized keys file accessible? ... yes Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled) Checking GitLab App ... Finished Checking GitLab subtasks ... Finished