docker+machine autoscaling builds on GCP failing
Our docker+machine autoscaling builds on GCP started failing several days ago. No changes on our side (just some c++ code changes that triggered new builds).
The failure appears to be related to the runner service launching the remote docker on the spawned instance using the "--storage-driver aufs" option, which leads to missing driver errors.
This is the high level GitLab log for the pipeline:
Running with gitlab-runner 13.9.0 (2ebc4dc4)
on gitlab-runner-gce-root-2 9db142ae
Preparing the "docker+machine" executor
05:48
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Job failed (system failure): exit status 1
The parent runner instance's log:
Apr 29 01:17:42 gitlab-runner-root-2 gitlab-runner[4282]: Copying certs to the remote machine... driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:44 gitlab-runner-root-2 gitlab-runner[4282]: Setting Docker configuration on the remote daemon... driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:45 gitlab-runner-root-2 gitlab-runner[4282]: Feeding runners to channel builds=1
Apr 29 01:17:45 gitlab-runner-root-2 gitlab-runner[4282]: Docker Machine Details creating=1 idle=0 maxMachineCreate=0 maxMachines=0 minIdleCount=0 removing=0 runner=9db142ae time=2023-04-29 01:17:45.076152166 +0000 UTC m=+474.234273866 total=1 used=0
Apr 29 01:17:45 gitlab-runner-root-2 gitlab-runner[4282]: Failed to request job: runner requestConcurrency meet builds=1 runner=9db142ae
Apr 29 01:17:46 gitlab-runner-root-2 gitlab-runner[4282]: Submitting job to coordinator... ok code=200 job=4200869459 job-status= runner=9db142ae update-interval=0s
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: Error creating machine: Error running provisioning: ssh command error: driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: command : sudo systemctl -f start docker driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: err : exit status 1 driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: output : Job for docker.service failed because the control process exited with error code. driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: See "systemctl status docker.service" and "journalctl -xe" for details. driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Problem while reading command output error=read |0: file already closed
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Problem while reading command output error=read |0: file already closed
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: Skipping provision retry on failed machine error=exit status 1 name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: Machine creation failed error=exit status 1 name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 time=34.562191809s
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Requesting machine removal lifetime=34.562447574s name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 now=2023-04-29 01:17:47.184745963 +0000 UTC m=+476.342867672 reason=Failed to create used=34.562447918s usedCount=0
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Stopping machine lifetime=34.58787779s name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 reason=Failed to create used=24.841869ms usedCount=0
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: Stopping "runner-9db142ae-auto-scale-runner-1682731032-4ec325c9"... name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=stop
Here is the output of "systemctl status docker.service" on a spawned machines that failed to execute the task:
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─10-machine.conf
Active: failed (Result: exit-code) since Sat 2023-04-29 01:20:32 UTC; 13s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 2409 ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=google (code=exited, status=1/FAILURE)
Main PID: 2409 (code=exited, status=1/FAILURE)
CPU: 88ms
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: Stopped Docker Application Container Engine.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: docker.service: Start request repeated too quickly.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: Failed to start Docker Application Container Engine.
And here's "journalctl -xe" on the same machine:
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Starting Docker Application Container Engine...
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.618022956Z" level=info msg="Starting up"
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.622244404Z" level=info msg="[graphdriver] trying configured driver: aufs"
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.622410834Z" level=warning msg="[graphdriver] WARNING: the aufs storage-driver is deprecated and will be removed in a future release; visit https://docs.docker.com/go/storage-driver/ for more information"
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.624813559Z" level=error msg="AUFS was not found in /proc/filesystems" storage-driver=aufs
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: failed to start daemon: error initializing graphdriver: driver not supported
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Failed to start Docker Application Container Engine.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Stopped Docker Application Container Engine.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Start request repeated too quickly.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Failed to start Docker Application Container Engine.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.socket: Failed with result 'service-start-limit-hit'.
Apr 29 01:23:25 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Stopping User Manager for UID 1004...