gitlab runner cannot connect to the Docker daemon

I have for gitlab runner running on different k8s cluster and recently my CI start to bug sometimes. The runner cannot acces to the docker deamon, I just retry the CI and it work eventualy after a few times. But now the issues persist and none of my CI are available.

To be specific here are the typical errors



Jobs
#322282760
 failed Job #322282760 triggered 2 minutes ago by Sami Jaghouar's avatar Sami Jaghouar
  
Running with gitlab-runner 12.3.0 (a8a019e0)
  on docker-auto-scale 0277ea0f
Using Docker executor with image docker:stable ...
Starting service docker:dind ...
Pulling docker image docker:dind ...
Using docker image sha256:0891431bfc89e11908174cc2c0fc1157c930bd74ace8b2a8134067a3628e4116 for docker:dind ...
Waiting for services to be up and running...

*** WARNING: Service runner-0277ea0f-project-10608629-concurrent-0-docker-0 probably didn't start properly.

Health check error:
service "runner-0277ea0f-project-10608629-concurrent-0-docker-0-wait-for-service" timeout

Health check container logs:


Service container logs:
2019-10-16T08:32:49.852883392Z time="2019-10-16T08:32:49.852646852Z" level=info msg="Starting up"
2019-10-16T08:32:49.877502067Z time="2019-10-16T08:32:49.877294796Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
2019-10-16T08:32:49.877954166Z time="2019-10-16T08:32:49.877886485Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
2019-10-16T08:32:49.879276103Z time="2019-10-16T08:32:49.879190595Z" level=info msg="libcontainerd: started new containerd process" pid=18
2019-10-16T08:32:49.879394702Z time="2019-10-16T08:32:49.879357400Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2019-10-16T08:32:49.879484679Z time="2019-10-16T08:32:49.879452126Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2019-10-16T08:32:49.879578627Z time="2019-10-16T08:32:49.879541411Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
2019-10-16T08:32:49.879624053Z time="2019-10-16T08:32:49.879598321Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2019-10-16T08:32:49.902331244Z time="2019-10-16T08:32:49.902137932Z" level=info msg="starting containerd" revision=b34a5c8af56e510852c35414db4c1f4fa6172339 version=v1.2.10 
2019-10-16T08:32:49.902799776Z time="2019-10-16T08:32:49.902739683Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 
2019-10-16T08:32:49.902985744Z time="2019-10-16T08:32:49.902943559Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 
2019-10-16T08:32:49.903312545Z time="2019-10-16T08:32:49.903254549Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" 
2019-10-16T08:32:49.903387024Z time="2019-10-16T08:32:49.903348748Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 
2019-10-16T08:32:49.908826704Z time="2019-10-16T08:32:49.908673547Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1" 
2019-10-16T08:32:49.908935814Z time="2019-10-16T08:32:49.908903904Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 
2019-10-16T08:32:49.909170348Z time="2019-10-16T08:32:49.909126727Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 
2019-10-16T08:32:49.909426579Z time="2019-10-16T08:32:49.909361477Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 
2019-10-16T08:32:49.909794470Z time="2019-10-16T08:32:49.909747861Z" level=info msg="skip loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 
2019-10-16T08:32:49.909842434Z time="2019-10-16T08:32:49.909818359Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 
2019-10-16T08:32:49.909973450Z time="2019-10-16T08:32:49.909933928Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1" 
2019-10-16T08:32:49.910027660Z time="2019-10-16T08:32:49.909993983Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" 
2019-10-16T08:32:49.910097899Z time="2019-10-16T08:32:49.910063825Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" 
2019-10-16T08:32:49.917833728Z time="2019-10-16T08:32:49.917691796Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1 
2019-10-16T08:32:49.917945485Z time="2019-10-16T08:32:49.917907697Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1 
2019-10-16T08:32:49.918074740Z time="2019-10-16T08:32:49.918040632Z" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.918128215Z time="2019-10-16T08:32:49.918104446Z" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.918194465Z time="2019-10-16T08:32:49.918162972Z" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.918245857Z time="2019-10-16T08:32:49.918222013Z" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.918315566Z time="2019-10-16T08:32:49.918284465Z" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.918363534Z time="2019-10-16T08:32:49.918340675Z" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.918468414Z time="2019-10-16T08:32:49.918433494Z" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.918523689Z time="2019-10-16T08:32:49.918494522Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1 
2019-10-16T08:32:49.918880449Z time="2019-10-16T08:32:49.918826481Z" level=info msg="loading plugin "io.containerd.runtime.v2.task"..." type=io.containerd.runtime.v2 
2019-10-16T08:32:49.919090864Z time="2019-10-16T08:32:49.919051966Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." type=io.containerd.monitor.v1 
2019-10-16T08:32:49.919628060Z time="2019-10-16T08:32:49.919575263Z" level=info msg="loading plugin "io.containerd.service.v1.tasks-service"..." type=io.containerd.service.v1 
2019-10-16T08:32:49.919735485Z time="2019-10-16T08:32:49.919699381Z" level=info msg="loading plugin "io.containerd.internal.v1.restart"..." type=io.containerd.internal.v1 
2019-10-16T08:32:49.919864709Z time="2019-10-16T08:32:49.919829606Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.919915222Z time="2019-10-16T08:32:49.919891086Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.919982347Z time="2019-10-16T08:32:49.919951437Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.920052683Z time="2019-10-16T08:32:49.920012343Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.920111150Z time="2019-10-16T08:32:49.920079703Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.920160469Z time="2019-10-16T08:32:49.920135560Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.920227913Z time="2019-10-16T08:32:49.920196822Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.920273668Z time="2019-10-16T08:32:49.920250609Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.920344298Z time="2019-10-16T08:32:49.920315197Z" level=info msg="loading plugin "io.containerd.internal.v1.opt"..." type=io.containerd.internal.v1 
2019-10-16T08:32:49.921617889Z time="2019-10-16T08:32:49.921543934Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.921714918Z time="2019-10-16T08:32:49.921658022Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.921767397Z time="2019-10-16T08:32:49.921744036Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.921836874Z time="2019-10-16T08:32:49.921802760Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.grpc.v1 
2019-10-16T08:32:49.922261258Z time="2019-10-16T08:32:49.922198896Z" level=info msg=serving... address="/var/run/docker/containerd/containerd-debug.sock" 
2019-10-16T08:32:49.922431009Z time="2019-10-16T08:32:49.922369071Z" level=info msg=serving... address="/var/run/docker/containerd/containerd.sock" 
2019-10-16T08:32:49.922507298Z time="2019-10-16T08:32:49.922454267Z" level=info msg="containerd successfully booted in 0.021011s" 
2019-10-16T08:32:49.948163831Z time="2019-10-16T08:32:49.947941439Z" level=info msg="Setting the storage driver from the $DOCKER_DRIVER environment variable (overlay2)"
2019-10-16T08:32:49.948701721Z time="2019-10-16T08:32:49.948610191Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2019-10-16T08:32:49.948792062Z time="2019-10-16T08:32:49.948730237Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2019-10-16T08:32:49.948908264Z time="2019-10-16T08:32:49.948846691Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
2019-10-16T08:32:49.949012675Z time="2019-10-16T08:32:49.948964773Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2019-10-16T08:32:49.961930530Z time="2019-10-16T08:32:49.961726233Z" level=info msg="parsed scheme: \"unix\"" module=grpc
2019-10-16T08:32:49.962030996Z time="2019-10-16T08:32:49.961966516Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
2019-10-16T08:32:49.962139242Z time="2019-10-16T08:32:49.962071926Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
2019-10-16T08:32:49.962221541Z time="2019-10-16T08:32:49.962158164Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
2019-10-16T08:32:49.998354926Z time="2019-10-16T08:32:49.998131528Z" level=info msg="Loading containers: start."
2019-10-16T08:32:50.043487201Z time="2019-10-16T08:32:50.042170007Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge                167936  1 br_netfilter\nstp                    16384  1 bridge\nllc                    16384  2 bridge,stp\nip: can't find device 'br_netfilter'\nbr_netfilter           24576  0 \nbridge                167936  1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1"
2019-10-16T08:32:50.149560013Z time="2019-10-16T08:32:50.147337934Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address"
2019-10-16T08:32:50.206602955Z time="2019-10-16T08:32:50.205342087Z" level=info msg="Loading containers: done."
2019-10-16T08:32:50.236589625Z time="2019-10-16T08:32:50.236083666Z" level=info msg="Docker daemon" commit=a872fc2f86 graphdriver(s)=overlay2 version=19.03.3
2019-10-16T08:32:50.236617049Z time="2019-10-16T08:32:50.236285714Z" level=info msg="Daemon has completed initialization"
2019-10-16T08:32:50.319571062Z time="2019-10-16T08:32:50.319171883Z" level=info msg="API listen on [::]:2375"
2019-10-16T08:32:50.319609971Z time="2019-10-16T08:32:50.319301125Z" level=info msg="API listen on /var/run/docker.sock"

*********

Pulling docker image docker:stable ...
Using docker image sha256:23fb2c9b38b59433ea1913eafa12d2e15651ca0d08819dc7067d27d8f92e0428 for docker:stable ...
Running on runner-0277ea0f-project-10608629-concurrent-0 via runner-0277ea0f-srm-1571214656-10c97669...
Fetching changes...
Initialized empty Git repository in /builds/project/.git/
Created fresh repository.

 * [new ref]         refs/pipelines/88853005 -> refs/pipelines/88853005
 * [new branch]      develop                 -> origin/develop
 * [new branch]      master                  -> origin/master
 * [new branch]      release/1.1.0           -> origin/release/1.1.0
 * [new branch]      release/1.2.0           -> origin/release/1.2.0
 * [new branch]      release/1.3.0           -> origin/release/1.3.0
 * [new branch]      release/1.3.1           -> origin/release/1.3.1
 * [new tag]         1.0.0                   -> 1.0.0
Checking out 538c7e4c as release/1.3.1...

Skipping Git submodules setup
$ version=${CI_COMMIT_REF_NAME##*/}
$ docker build -t ${image_name}:rc-${version}-${CI_COMMIT_SHORT_SHA} .
time="2019-10-16T08:33:38Z" level=error msg="failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial tcp [::1]:2375: connect: connection refused"
error during connect: Post http://localhost:2375/v1.40/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&session=om2twb0wntkci52gwyxckhwzr&shmsize=0&t=eu.gcr.io%2216913%2Fapi%3Arc-1.3.1-538c7e4c&target=&ulimits=null&version=1: context canceled
ERROR: Job failed: exit code 1

What can I do ?