regression: gitlab-runner 11.11.x fails to mount /var/run/docker.sock
Summary
After upgrade from 11.10 to 11.11, gitlab-runners fails to mount /var/run/docker.sock
resulting in failed jobs.
Steps to reproduce
- our gitlab-ee instance was upgraded on sunday 26th, 2019 from 11.10 to 11.11.0 (on Debian jessie).
- our gitlab-runner-01 (and the others) instance were upgraded after from 11.10.1 to 11.11.1 (and they are Debian stretch).
The following job now fails (parts were removed to try to be concise):
image: julienlecomte/docker-make
stages:
- stage-1
variables:
DOCKER_TAG: $CI_COMMIT_REF_NAME
DOCKERFLAGS: "--pull"
# ------------------------------------------------------------------------------
# common items:
.defaults:
tags:
- dind
services:
- docker:dind
script:
- (...)
after_script:
- (...)
docker-images/base:
extends: .defaults
stage: stage-1
variables:
SUBDIR: base
Our runner configuration file:
concurrent = 16
check_interval = 0
listen_address = ":9252"
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-01"
url = "https://xyz.example.com/"
token = "XXXXXXXX"
executor = "docker"
[runners.docker]
tls_verify = false
image = "julienlecomte/docker-make"
privileged = true
disable_entrypoint_overwrite = false
disable_cache = true
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
shm_size = 0
memory = "512m"
memory_swap = "4g"
oom_kill_disable = true
cpus = "2"
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
What is the current bug behavior?
Job fails complaining about mounting /var/run/docker.sock
Running with gitlab-runner 11.11.1 (5a147c92)
on gitlab-runner-01 XXXXXXXX
Using Docker executor with image julienlecomte/docker-make ...
Starting service docker:dind ...
Pulling docker image docker:dind ...
Using docker image sha256:bed64de70fa1f4d0b5a498791647c45d954cb0306ec2852dbcfb956f4ff3b0d6 for docker:dind ...
Waiting for services to be up and running...
*** WARNING: Service runner-XXXXXXXX-project-274-concurrent-0-docker-0 probably didn't start properly.
Health check error:
service "runner-XXXXXXXX-project-274-concurrent-0-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2019-05-29T07:30:41.311369337Z time="2019-05-29T07:30:41.311042012Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2019-05-29T07:30:41.311431918Z Failed to load listeners: can't create unix socket /var/run/docker.sock: device or resource busy
What is the expected correct behavior?
Job should succeed like it did with runners 11.10.
Runners were downgraded to 11.10, and things went back to normal. Downgrading from 11.11.1 to 11.11.0 did not work.
Relevant logs and/or screenshots
# As root@gitlab-runner-01:
$ ls -la /var/run/docker.sock
srw-rw---- 1 root docker 0 May 20 13:35 /var/run/docker.sock
$ grep docker /etc/group
docker:x:999:gitlab-runner
$ grep gitlab-runner /etc/group
docker:x:999:gitlab-runner
gitlab-runner:x:998:
$ id -a gitlab-runner
uid=999(gitlab-runner) gid=998(gitlab-runner) groups=998(gitlab-runner),999(docker)
$ docker --version
Docker version 18.09.6, build 481bc77
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
julienlecomte/docker-make latest ee941d7ee7e7 3 hours ago 198MB
gitlab/gitlab-runner-helper x86_64-5a147c92 1f73b41f9007 47 hours ago 52.4MB
gitlab/gitlab-runner-helper x86_64-6c154264 9519e0be7ab5 4 days ago 52.4MB
gitlab/gitlab-runner-helper x86_64-1f513601 2362143044ae 8 days ago 52.4MB
docker dind bed64de70fa1 2 weeks ago 183MB
gitlab/gitlab-runner-helper x86_64-3001a600 9ddf7cc7027d 5 weeks ago 52.4MB
gitlab/gitlab-runner-helper x86_64-692ae235 139e69e64a47 2 months ago 52.3MB
gitlab/gitlab-runner-helper x86_64-4745a6f3 3870045c50da 2 months ago 49.2MB
gcr.io/gcp-runtimes/container-structure-test latest adb14de2bc72 3 months ago 36.8MB
Workaround
Downgrading runners to 11.10.
Edited by Julien Lecomte