Skip to content

Error `14:failed to connect to all addresses` using omnibus-gitlab docker image - Gitaly bootstrap fails with `address already in use`

Summary

Gitaly sometimes fails to restart during the startup process of omnibus-gitlab docker containers, leading to QA tests failing with the error 14:failed to connect to all addresses when making an API call to create a project.

More info

First noticed in elasticsearch tests: #197207 (closed)

But also found in other tests: https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608

The QA tests run against an omnibus-gitlab docker container. The latter set of tests use the default omnibus-gitlab configuration, and so does the former (but also sets ELASTIC_URL).

Steps to reproduce

It happens merely by running the QA tests using gitlab-qa, but unfortunately successful in reproducing it reliably. The jobs that failed passed on retry.

The command to run the test as in https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608 would be gitlab-qa Test::Instance::Image EE

What is the current bug behavior?

Gitaly bootstrap sometimes fails with address already in use upon restart when using an omnibus-gitlab docker image.

What is the expected correct behavior?

Gitaly bootstrap succeeds.

Relevant logs and/or screenshots

The problem with gitaly seems to happen during the omnibus-gitlab startup process. It looks like it restarts all the services right at the end, including gitaly (from https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/416011673):

[0mRecipe: <Dynamically Defined Resource>[0m
  * service[gitaly] action restart
    [32m- restart service service[gitaly][0m
[0mRecipe: gitaly::enable[0m
  * runit_service[gitaly] action hup
    [32m- send hup to runit_service[gitaly][0m

Logs for reconfigure show the timing:

[2020-01-27T13:46:36+00:00] INFO: env_dir[/opt/gitlab/etc/gitaly/env] sending restart action to service[gitaly] (delayed)
[2020-01-27T13:46:36+00:00] INFO: service[gitaly] restarted
[2020-01-27T13:46:36+00:00] INFO: template[Create Gitaly config.toml] sending hup action to runit_service[gitaly] (delayed)
[2020-01-27T13:46:36+00:00] INFO: runit_service[gitaly] signalled (HUP)
[2020-01-27T13:46:36+00:00] INFO: runit_service[gitaly] sent hup

And the corresponding gitaly logs look normal (AFAIK) right up until the end:

{"gitaly":434,"level":"warning","msg":"forwarding signal","signal":15,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"gitaly":434,"level":"warning","msg":"forwarding signal","signal":18,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","434","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","time":"2020-01-27T13:46:36Z"}
{"error":"signal: killed","level":"warning","msg":"exited","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","434","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","time":"2020-01-27T13:46:36Z"}
{"error":"received signal \"terminated\"","level":"error","msg":"shutting down","time":"2020-01-27T13:46:36Z"}
{"gitaly":434,"level":"warning","msg":"forwarding signal","signal":17,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"error":"os: process already finished","gitaly":434,"level":"error","msg":"can't forward the signal","signal":17,"time":"2020-01-27T13:46:36Z","wrapper":427}
{"gitaly":434,"level":"error","msg":"wrapper for gitaly shutting down","time":"2020-01-27T13:46:36Z","wrapper":427}
{"level":"info","msg":"Wrapper started","time":"2020-01-27T13:46:36Z","wrapper":1318}
{"level":"info","msg":"finding gitaly","pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2020-01-27T13:46:36Z","wrapper":1318}
{"level":"info","msg":"spawning a process","time":"2020-01-27T13:46:36Z","wrapper":1318}
time="2020-01-27T13:46:37Z" level=info msg="Starting Gitaly" version="Gitaly, version 1.83.0"
time="2020-01-27T13:46:37Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/opt/gitlab/embedded/bin/git
time="2020-01-27T13:46:37Z" level=info msg="clearing disk cache object folder" storage=default
time="2020-01-27T13:46:37Z" level=info msg="moving disk cache object folder to /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache073245658" storage=default
time="2020-01-27T13:46:37Z" level=info msg="disk cache object folder doesn't exist, no need to remove" storage=default
time="2020-01-27T13:46:37Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/cache" storage=default
time="2020-01-27T13:46:37Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/state" storage=default
{"address":"/var/opt/gitlab/gitaly/gitaly.socket","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:37Z"}
{"address":"/var/opt/gitlab/gitaly/internal_sockets/internal.sock","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:37Z"}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2020-01-27T13:46:37Z","time_ms":0}
{"level":"info","msg":"Wrapper started","time":"2020-01-27T13:46:38Z","wrapper":1356}
{"level":"info","msg":"finding gitaly","pid_file":"/var/opt/gitlab/gitaly/gitaly.pid","time":"2020-01-27T13:46:38Z","wrapper":1356}
{"level":"info","msg":"spawning a process","time":"2020-01-27T13:46:38Z","wrapper":1356}
{"gitaly":1366,"level":"info","msg":"monitoring gitaly","time":"2020-01-27T13:46:38Z","wrapper":1356}
time="2020-01-27T13:46:38Z" level=info msg="Starting Gitaly" version="Gitaly, version 1.83.0"
time="2020-01-27T13:46:38Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/opt/gitlab/embedded/bin/git
time="2020-01-27T13:46:38Z" level=info msg="clearing disk cache object folder" storage=default
time="2020-01-27T13:46:38Z" level=info msg="moving disk cache object folder to /var/opt/gitlab/git-data/repositories/+gitaly/tmp/diskcache118588225" storage=default
time="2020-01-27T13:46:38Z" level=info msg="disk cache object folder doesn't exist, no need to remove" storage=default
time="2020-01-27T13:46:38Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/cache" storage=default
time="2020-01-27T13:46:38Z" level=info msg="Starting file walker for /var/opt/gitlab/git-data/repositories/+gitaly/state" storage=default
{"address":"localhost:9236","level":"info","msg":"starting prometheus listener","time":"2020-01-27T13:46:38Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.1","supervisor.rss_threshold":209715200,"time":"2020-01-27T13:46:38Z"}
{"level":"info","msg":"starting RSS monitor","supervisor.name":"gitaly-ruby.0","supervisor.rss_threshold":209715200,"time":"2020-01-27T13:46:38Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","supervisor.pid":1375,"time":"2020-01-27T13:46:38Z"}
{"level":"warning","msg":"spawned","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","supervisor.pid":1376,"time":"2020-01-27T13:46:38Z"}
{"level":"info","msg":"PID 1376 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.0"],"supervisor.name":"gitaly-ruby.0","time":"2020-01-27T13:46:39Z"}
{"level":"info","msg":"PID 1375 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile","supervisor.args":["bundle","exec","bin/ruby-cd","/var/opt/gitlab/gitaly","/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby","1325","/var/opt/gitlab/gitaly/internal_sockets/ruby.1"],"supervisor.name":"gitaly-ruby.1","time":"2020-01-27T13:46:39Z"}
{"level":"info","msg":"finished tempdir cleaner walk","storage":"default","time":"2020-01-27T13:46:40Z","time_ms":0}
{"address":"/var/opt/gitlab/gitaly/gitaly.socket","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:40Z"}
{"address":"/var/opt/gitlab/gitaly/internal_sockets/internal.sock","level":"info","msg":"listening at unix address","time":"2020-01-27T13:46:40Z"}
{"error":"unable to start the bootstrap: can't create new listener: listen tcp 127.0.0.1:9236: bind: address already in use","level":"error","msg":"shutting down","time":"2020-01-27T13:46:40Z"}

Here are all the logs from that job - elasticsearch_job.zip

And the logs from https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608 - ee_instance_job.zip

Output of checks

Results of GitLab environment info

From the omnibus-gitlab docker image used in https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/421069608

Expand for output related to GitLab environment info
System information
System:		
Proxy:		no
Current User:	git
Using RVM:	no
Ruby Version:	2.6.5p114
Gem Version:	2.7.10
Bundler Version:1.17.3
Rake Version:	12.3.3
Redis Version:	5.0.7
Git Version:	2.24.1
Sidekiq Version:5.2.7
Go Version:	unknown

GitLab information
Version:	12.7.0-pre
Revision:	defea506b0f
Directory:	/opt/gitlab/embedded/service/gitlab-rails
DB Adapter:	PostgreSQL
DB Version:	10.9
URL:		http://gitlab.test
HTTP Clone URL:	http://gitlab.test/some-group/some-project.git
SSH Clone URL:	git@gitlab.test:some-group/some-project.git
Elasticsearch:	no
Geo:		no
Using LDAP:	no
Using Omniauth:	yes
Omniauth Providers: 

GitLab Shell
Version:	11.0.0
Repository storage paths:
- default: 	/var/opt/gitlab/git-data/repositories
GitLab Shell path:		/opt/gitlab/embedded/service/gitlab-shell
Git:		/opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check
Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 11.0.0 ? ... OK (11.0.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes
Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet)
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ... can't check, you have no projects
Redis version >= 2.8.0? ... yes
Ruby version >= 2.5.3 ? ... yes (2.6.5)
Git version >= 2.22.0 ? ... yes (2.24.1)
Git user has default SSH configuration? ... yes
Active users: ... 1
Is authorized keys file accessible? ... yes
Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished


Checking GitLab subtasks ... Finished
Edited by Mark Lapierre