CI docker-machine/gitlab-runner Bastion is broken
When accessing the Digital Ocean bastion, no new gitlab-runners were being spun up via docker-machine and existing ones were errored (required force removal):
root@bastion-debian9:~# docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
runner-etstjmcm-autoscale-bastion-debian9-1602675098-5ae135f1 - digitalocean Error Unknown dropletID is invalid because cannot be less than 1
runner-etstjmcm-autoscale-bastion-debian9-1602675098-7c880f09 - digitalocean Error Unknown dropletID is invalid because cannot be less than 1
runner-etstjmcm-autoscale-bastion-debian9-1602675098-956f5218 - digitalocean Error Unknown dropletID is invalid because cannot be less than 1
runner-etstjmcm-autoscale-bastion-debian9-1602675098-04186f65 - digitalocean Error Unknown dropletID is invalid because cannot be less than 1
runner-etstjmcm-autoscale-bastion-debian9-1602675098-77074a5c - digitalocean Error Unknown dropletID is invalid because cannot be less than 1
Tracing #buildstream logs, it looks like we've seemingly hit this problem before https://irclogs.baserock.org/%23baserock.2017-12-14.log.html
Looking at /etc/gitlab-runner/config.toml
, it's using "digitalocean-image=fedora-30-x64"
for the runner machines. Digital Ocean removed this image on Oct 8th:
https://www.digitalocean.com/docs/release-notes/images/#upcoming-changes
Bumping this to "digitalocean-image=fedora-31-x64"
(I'm unsure if anything else in the config needs changing), restarting the gitlab-runner + forcefully removing all existing machines & rebooting the DO instance seemingly gets around the previous error initially. However the new runners look like they're taking too long to be resolvable and timeout when a pipeline is retriggered. There's also a cgroups error from the docker executer https://gitlab.com/BuildStream/buildstream/-/jobs/790393700 I expect there's some updates needed for docker-machine, docker, gitlab-runner config or something around the cgroup subsystem somewhere in the chain.
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
runner-etstjmcm-autoscale-bastion-debian9-1602677856-2c4a297f - digitalocean Running tcp://161.35.92.0:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677856-4d24657a - digitalocean Running tcp://134.209.88.15:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677857-6e72a3e9 - digitalocean Running tcp://161.35.92.158:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
runner-etstjmcm-autoscale-bastion-debian9-1602677857-164df275 - digitalocean Running tcp://161.35.84.94:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677858-0dc01bb2 - digitalocean Running tcp://161.35.80.234:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677858-36aafeec - digitalocean Running tcp://128.199.57.197:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677858-bdf79c72 - digitalocean Running tcp://188.166.34.28:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677859-571633ff - digitalocean Running tcp://161.35.84.194:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677859-f51bdb95 - digitalocean Running tcp://104.248.81.93:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602677937-feb928c2 - digitalocean Running tcp://188.166.43.164:2376 v19.03.13
runner-etstjmcm-autoscale-bastion-debian9-1602678182-057753fb - digitalocean Running tcp://142.93.238.114:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
runner-etstjmcm-autoscale-bastion-debian9-1602678198-837886fc - digitalocean Running tcp://167.71.5.239:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
I've set it back to "digitalocean-image=fedora-30-x64"
& restarted the DO instance.