Skip to content

unix socket errors when NETWORK_PER_BUILD is enabled on the Shared Runners

Summary

During the roll out of this feature in Production via gitlab-com/gl-infra&905, specifically to the Shared Runners:

A number of the users reported unexpected job failures: https://gitlab.com/gitlab-com/ops-sub-department/section-ops-request-for-help/-/issues/91.

Steps to reproduce

I was unable to reproduce the bug in any of the other runners that have it enabled in Production. See https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17541.

Actual behavior

Jobs fail with the error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Expected behavior

Jobs are passing successfully.

Relevant logs and/or screenshots

job log
Running with gitlab-runner 15.9.0~beta.115.g598a7c91 (598a7c91)
  on blue-1.shared.runners-manager.gitlab.com/default j1aLDqxS, system ID: s_b437a71a38f9
  feature flags: FF_NETWORK_PER_BUILD:true
...

$ docker pull $REGISTRY_PATH:$DOCKER_IMAGE_TAG
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

See https://gitlab.com/gitlab-com/ops-sub-department/section-ops-request-for-help/-/issues/91 for more details.

Environment description

GitLab SaaS shared Linux runners.

Used GitLab Runner version

Running with gitlab-runner 15.9.0~beta.115.g598a7c91 (598a7c91)

Possible fixes

Turn off the feature in config.toml.

Users should also be able to override the feature flag in .gitlab-ci.yml, but that's not currently working as expected, see: #30884 (closed).