unix socket errors when NETWORK_PER_BUILD is enabled on the Shared Runners
Summary
During the roll out of this feature in Production via gitlab-com/gl-infra&905, specifically to the Shared Runners:
A number of the users reported unexpected job failures: https://gitlab.com/gitlab-com/ops-sub-department/section-ops-request-for-help/-/issues/91.
Steps to reproduce
I was unable to reproduce the bug in any of the other runners that have it enabled in Production. See https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17541.
Actual behavior
Jobs fail with the error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Expected behavior
Jobs are passing successfully.
Relevant logs and/or screenshots
job log
Running with gitlab-runner 15.9.0~beta.115.g598a7c91 (598a7c91)
on blue-1.shared.runners-manager.gitlab.com/default j1aLDqxS, system ID: s_b437a71a38f9
feature flags: FF_NETWORK_PER_BUILD:true
...
$ docker pull $REGISTRY_PATH:$DOCKER_IMAGE_TAG
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
See https://gitlab.com/gitlab-com/ops-sub-department/section-ops-request-for-help/-/issues/91 for more details.
Environment description
GitLab SaaS shared Linux runners.
Used GitLab Runner version
Running with gitlab-runner 15.9.0~beta.115.g598a7c91 (598a7c91)
Possible fixes
Turn off the feature in config.toml
.
Users should also be able to override the feature flag in .gitlab-ci.yml
, but that's not currently working as expected, see: #30884 (closed).