Skip to content

Runner system failure

Summary

Runner system failure observed when multiple docker runners are active concurrently.

What is the current bug behavior?

Runners work perfectly if called one at a time with no jobs in queue. As soon as there are jobs in a queue, runners start failing till there are no more jobs in queues. Once number of jobs = number of runners, all runners finish as expected.

Running with gitlab-runner 11.1.0 (081978aa)
  on docker-runner-2 ff9e524d
Using Docker executor with image debian:stretch ...
Pulling docker image debian:stretch ...
Using docker image sha256:3bbb526d26083e7a65a7a112ed72e1ec58e81384412f2d3fcdbbd87d49fd588d for debian:stretch ...
Running on runner-ff9e524d-project-5-concurrent-0 via dfe18fd43fb0...
Cloning repository...
Cloning into '/builds/robotics/code/beagle_black'...
Checking out 80fcbd9b as master...
Skipping Git submodules setup
$ sh scripts/installation.sh

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Get:2 http://security.debian.org/debian-security stretch/updates InRelease [94.3 kB]
Ign:1 http://cdn-fastly.deb.debian.org/debian stretch InRelease
Get:3 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease [91.0 kB]
ERROR: Job failed (system failure): Error: No such container: dd70c68ba335678a6fe439f3c8084b08ac013bd9d448b320f34724652a45d768 (executor_docker.go:965:0s)

Another log, this time with a previously unseen warning

Running with gitlab-runner 11.1.0 (081978aa)
  on docker-runner-1 77b96ed0
Using Docker executor with image ubuntu:18.04 ...
Pulling docker image ubuntu:18.04 ...
Using docker image sha256:16508e5c265dcb5c05017a2a8a8228ae12b7b56b2cda0197ed5411bda200a961 for ubuntu:18.04 ...
Running on runner-77b96ed0-project-5-concurrent-0 via 77c099be3094...
Cloning repository...
Cloning into '/builds/robotics/code/beagle_black'...
Checking out 80fcbd9b as master...
Skipping Git submodules setup
WARNING: Possibly zombie container runner-77b96ed0-project-5-concurrent-0-build-4 is disconnected from network gitlab-network
$ sh scripts/installation.sh

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [83.2 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
ERROR: Job failed (system failure): Error: No such container: 7decabe3805a7d0faf9a51d385cf32d71e91cfc5cbc7b9d1155e314f9d8dd352 (executor_docker.go:965:0s)

This error has some more information that can use useful

Running with gitlab-runner 11.1.0 (081978aa)
  on docker-runner-1 77b96ed0
Using Docker executor with image debian:stretch ...
Pulling docker image debian:stretch ...
Using docker image sha256:3bbb526d26083e7a65a7a112ed72e1ec58e81384412f2d3fcdbbd87d49fd588d for debian:stretch ...
ERROR: Job failed (system failure): Error response from daemon: Conflict. The container name "/runner-77b96ed0-project-5-concurrent-0-predefined-0" is already in use by container "97da1d8b016ba3b0bd3ea2d19ed2b42e1e6b06758604c8ef19b41eb3ab3d04fb". You have to remove (or rename) that container to be able to reuse that name. (executor_docker.go:922:0s)

Also observed ERROR: Job failed: exit code 137 as referenced here (gitlab-com/infrastructure, #2289)

What is the expected correct behavior?

The pipeline should have gone through without a runner failure.

Results of GitLab environment info

Expand for output related to GitLab environment info
System information
System:
Current User:   git
Using RVM:      no
Ruby Version:   2.4.4p296
Gem Version:    2.7.6
Bundler Version:1.16.2
Rake Version:   12.3.1
Redis Version:  3.2.11
Git Version:    2.17.1
Sidekiq Version:5.1.3
Go Version:     unknown

GitLab information Version: 11.1.4 Revision: 63daf37 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql URL: http://git.custom-domain HTTP Clone URL: http://git.custom-domain/some-group/some-project.git SSH Clone URL: git@git.custom-domain:some-group/some-project.git Using LDAP: no Using Omniauth: no

GitLab Shell Version: 7.1.4 Repository storage paths:

  • default: /var/opt/gitlab/git-data/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check
Checking GitLab Shell ...

GitLab Shell version >= 7.1.4 ? ... OK (7.1.4) Repo base directory exists? default... yes Repo storage directories are symlinks? default... no Repo paths owned by git:root, or git:git? default... yes Repo paths access is drwxrws---? default... yes hooks directories in repos are links: ... 5/1 ... ok 5/2 ... ok 5/4 ... ok 5/5 ... ok 5/6 ... ok 5/7 ... ok 5/8 ... ok 3/10 ... ok 3/11 ... ok 12/12 ... ok 11/13 ... ok 13/14 ... ok 11/15 ... ok 5/16 ... ok 12/17 ... ok 12/19 ... ok 2/20 ... ok 5/21 ... ok 5/22 ... repository is empty Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK

Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Sidekiq ...

Running? ... yes Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Reply by email is disabled in config/gitlab.yml Checking LDAP ...

LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 5/1 ... yes 5/2 ... yes 5/4 ... yes 5/5 ... yes 5/6 ... yes 5/7 ... yes 5/8 ... yes 3/10 ... yes 3/11 ... yes 12/12 ... yes 11/13 ... yes 13/14 ... yes 11/15 ... yes 5/16 ... yes 12/17 ... yes 12/19 ... yes 2/20 ... yes 5/21 ... yes 5/22 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.3.5 ? ... yes (2.4.4) Git version >= 2.9.5 ? ... yes (2.17.1) Git user has default SSH configuration? ... yes Active users: ... 7

Checking GitLab ... Finished

Edited by Kunal Tyagi