Some Docker Gitlab Runner jobs fail at 4:20 AM
Summary
I run a scheduled Gitlab runner every night. Some of the jobs started at 4:20 AM fail - every night. However, not all of them fail. And its not always the same jobs failing and not always the same number of jobs failing.
The error message (in journalctl) says:
ERROR: Job failed (system failure): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? (executor_docker.go:837:120s) duration=3m29.686047725s
Steps to reproduce
Configure a scheduled pipeline to run every day at 4 AM. Make sure to schedule enough jobs so that the runner will need at least until 4:20 AM. In my case, there are 200 Jobs that do basically nothing but echoing a line to the consolve.
The .gitlab-ci.yml looks like this:
.gitlab-ci.yml
stages:
- build
.testjob: &testjob
image: maven:3-jdk-8
stage: build
script:
- echo DNS is working
Job1:
<<: *testjob
Job2:
<<: *testjob
Job3:
<<: *testjob
Job4:
<<: *testjob
Job5:
<<: *testjob
Job6:
<<: *testjob
Job7:
<<: *testjob
Job8:
<<: *testjob
Job9:
<<: *testjob
Job10:
<<: *testjob
Job11:
<<: *testjob
Job12:
<<: *testjob
Job13:
<<: *testjob
Job14:
<<: *testjob
Job15:
<<: *testjob
Job16:
<<: *testjob
Job17:
<<: *testjob
Job18:
<<: *testjob
Job19:
<<: *testjob
Job20:
<<: *testjob
Job21:
<<: *testjob
Job22:
<<: *testjob
Job23:
<<: *testjob
Job24:
<<: *testjob
Job25:
<<: *testjob
Job26:
<<: *testjob
Job27:
<<: *testjob
Job28:
<<: *testjob
Job29:
<<: *testjob
Job30:
<<: *testjob
Job31:
<<: *testjob
Job32:
<<: *testjob
Job33:
<<: *testjob
Job34:
<<: *testjob
Job35:
<<: *testjob
Job36:
<<: *testjob
Job37:
<<: *testjob
Job38:
<<: *testjob
Job39:
<<: *testjob
Job40:
<<: *testjob
Job41:
<<: *testjob
Job42:
<<: *testjob
Job43:
<<: *testjob
Job44:
<<: *testjob
Job45:
<<: *testjob
Job46:
<<: *testjob
Job47:
<<: *testjob
Job48:
<<: *testjob
Job49:
<<: *testjob
Job50:
<<: *testjob
Job51:
<<: *testjob
Job52:
<<: *testjob
Job53:
<<: *testjob
Job54:
<<: *testjob
Job55:
<<: *testjob
Job56:
<<: *testjob
Job57:
<<: *testjob
Job58:
<<: *testjob
Job59:
<<: *testjob
Job60:
<<: *testjob
Job61:
<<: *testjob
Job62:
<<: *testjob
Job63:
<<: *testjob
Job64:
<<: *testjob
Job65:
<<: *testjob
Job66:
<<: *testjob
Job67:
<<: *testjob
Job68:
<<: *testjob
Job69:
<<: *testjob
Job70:
<<: *testjob
Job71:
<<: *testjob
Job72:
<<: *testjob
Job73:
<<: *testjob
Job74:
<<: *testjob
Job75:
<<: *testjob
Job76:
<<: *testjob
Job77:
<<: *testjob
Job78:
<<: *testjob
Job79:
<<: *testjob
Job80:
<<: *testjob
Job81:
<<: *testjob
Job82:
<<: *testjob
Job83:
<<: *testjob
Job84:
<<: *testjob
Job85:
<<: *testjob
Job86:
<<: *testjob
Job87:
<<: *testjob
Job88:
<<: *testjob
Job89:
<<: *testjob
Job90:
<<: *testjob
Job91:
<<: *testjob
Job92:
<<: *testjob
Job93:
<<: *testjob
Job94:
<<: *testjob
Job95:
<<: *testjob
Job96:
<<: *testjob
Job97:
<<: *testjob
Job98:
<<: *testjob
Job99:
<<: *testjob
Job100:
<<: *testjob
Job101:
<<: *testjob
Job102:
<<: *testjob
Job103:
<<: *testjob
Job104:
<<: *testjob
Job105:
<<: *testjob
Job106:
<<: *testjob
Job107:
<<: *testjob
Job108:
<<: *testjob
Job109:
<<: *testjob
Job110:
<<: *testjob
Job111:
<<: *testjob
Job112:
<<: *testjob
Job113:
<<: *testjob
Job114:
<<: *testjob
Job115:
<<: *testjob
Job116:
<<: *testjob
Job117:
<<: *testjob
Job118:
<<: *testjob
Job119:
<<: *testjob
Job120:
<<: *testjob
Job121:
<<: *testjob
Job122:
<<: *testjob
Job123:
<<: *testjob
Job124:
<<: *testjob
Job125:
<<: *testjob
Job126:
<<: *testjob
Job127:
<<: *testjob
Job128:
<<: *testjob
Job129:
<<: *testjob
Job130:
<<: *testjob
Job131:
<<: *testjob
Job132:
<<: *testjob
Job133:
<<: *testjob
Job134:
<<: *testjob
Job135:
<<: *testjob
Job136:
<<: *testjob
Job137:
<<: *testjob
Job138:
<<: *testjob
Job139:
<<: *testjob
Job140:
<<: *testjob
Job141:
<<: *testjob
Job142:
<<: *testjob
Job143:
<<: *testjob
Job144:
<<: *testjob
Job145:
<<: *testjob
Job146:
<<: *testjob
Job147:
<<: *testjob
Job148:
<<: *testjob
Job149:
<<: *testjob
Job150:
<<: *testjob
Job151:
<<: *testjob
Job152:
<<: *testjob
Job153:
<<: *testjob
Job154:
<<: *testjob
Job155:
<<: *testjob
Job156:
<<: *testjob
Job157:
<<: *testjob
Job158:
<<: *testjob
Job159:
<<: *testjob
Job160:
<<: *testjob
Job161:
<<: *testjob
Job162:
<<: *testjob
Job163:
<<: *testjob
Job164:
<<: *testjob
Job165:
<<: *testjob
Job166:
<<: *testjob
Job167:
<<: *testjob
Job168:
<<: *testjob
Job169:
<<: *testjob
Job170:
<<: *testjob
Job171:
<<: *testjob
Job172:
<<: *testjob
Job173:
<<: *testjob
Job174:
<<: *testjob
Job175:
<<: *testjob
Job176:
<<: *testjob
Job177:
<<: *testjob
Job178:
<<: *testjob
Job179:
<<: *testjob
Job180:
<<: *testjob
Job181:
<<: *testjob
Job182:
<<: *testjob
Job183:
<<: *testjob
Job184:
<<: *testjob
Job185:
<<: *testjob
Job186:
<<: *testjob
Job187:
<<: *testjob
Job188:
<<: *testjob
Job189:
<<: *testjob
Job190:
<<: *testjob
Job191:
<<: *testjob
Job192:
<<: *testjob
Job193:
<<: *testjob
Job194:
<<: *testjob
Job195:
<<: *testjob
Job196:
<<: *testjob
Job197:
<<: *testjob
Job198:
<<: *testjob
Job199:
<<: *testjob
Job200:
<<: *testjob
Actual behavior
Some - but not all and not always the same - jobs fail.
Expected behavior
All jobs should succeed.
Relevant logs and/or screenshots
job log
Running with gitlab-runner 12.1.0 (de7731dd)
on XXXX XXXXXXX
Using Docker executor with image maven:3-jdk-8 ...
Pulling docker image maven:3-jdk-8 ...
Using docker image sha256:2fa604c5c53b6adaeab11550b71048693a4b018a0e7ac6a93af87b09b25100af for maven:3-jdk-8 ...
Running on runner-XXXXXXXX-project-32509-concurrent-0 via XXXX...
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /builds/se-arc/gitlab-runner-test/.git/
Checking out 5d7f6d66 as master...
Skipping Git submodules setup
ERROR: Job failed (system failure): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? (executor_docker.go:837:120s)
Environment description
config.toml contents
[[runners]]
name = "XXXX"
url = "https://XXXXXXXXXXXXX.de/"
token = "XXXXXXXXXXXXXXXXX"
executor = "docker"
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
extra_hosts = ["XXXXXXXXXXXX:XXX.XXX.XXX.XXX"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
Used GitLab Runner version
Version: 12.1.0
Git revision: de7731dd
Git branch: 12-1-stable
GO version: go1.8.7
Built: 2019-07-19T13:53:04+0000
OS/Arch: linux/amd64
Possible fixes
According to the error message, the error was thrown by executor_docker.go:837:120s. However, I neither know how this is related to the bug nor any workarounds.