Docker-machine on digitalocean does not wait long enough (and leaves a zombie machine)

Summary

Gitlab-runner using docker-machine on digitalocean has ceased functioning.

Steps to reproduce

Setup gitlab runner on digital ocean using a ubuntu-18 or 20 base image. Start a job, watch the journalctl of gitab-runner. Use the most recent gitlab version of docker-machine and gitlab-runner:

root@gitlab-runner-bastion:~# gitlab-runner --version
Version:      13.6.0
Git revision: 8fa89735
Git branch:   13-6-stable
GO version:   go1.13.8
Built:        2020-11-21T06:16:31+0000
OS/Arch:      linux/amd64
root@gitlab-runner-bastion:~# docker-machine --version
docker-machine version 0.16.2-gitlab.8, build 38aad0d2

Actual behavior

Gitlab runner uses docker-machine to start a new machine. It creates a droplet, waits for ssh, copies over certificates, restarts the target docker, tries to connect to the target docker and fails with:

Dec 10 01:42:45 gitlab-runner-bastion gitlab-runner[24609]: ERROR: Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded  driver=digitalocean name=runner-qemdeozz-gitlab-runner-autoscale-1607564407-6a33de96 operation=create

That's the old gitlab-runner message. The new gitlab runner seems to not have that message anymore but everything else is the same around it:

er-qemdeozz-gitlab-runner-autoscale-1607565422-37d6ce51 operation=create
Dec 10 01:59:24 gitlab-runner-bastion gitlab-runner[25328]: Setting Docker configuration on the remote daemon...  driver=digitalocean name=runner-qemdeozz-gitlab-runner-autoscale-1607565422-37d6ce51 operation=create
Dec 10 02:00:07 gitlab-runner-bastion gitlab-runner[25328]: WARNING: Problem while reading command output       error=read |0: file already closed

The machine remains up (costing money!) and gitlab-runner tries to start another machine. Docker-machine agrees the target docker isn't up:

$ docker-machine ls
runner-XXXXX-gitlab-runner-autoscale-XXXXX   -        digitalocean   Running   tcp://XXX.XXX.XXX.XXX:2376           Unknown   Unable to query docker version: Cannot connect to the docker engine endpoint

But if we wait one more minute then the target docker does come up!

runner-qemdeozz-gitlab-runner-autoscale-1607564842-fdee4eda   -        digitalocean   Running   tcp://104.131.182.250:2376           v20.10.0

And now it just sits there, costing money while gitlab-runner starts up yet another machine it will abandon.

Expected behavior

gitlab-runner waits long enough, or allows for a parameter.

Relevant logs and/or screenshots

Inline above

Environment description

It's a typical ubuntu + the latest docker-machine from gitlab + gitlab-runner. I tried with ubuntu 18 and 20.

concurrent = 16
check_interval = 10

[session_server]
  session_timeout = 1800

[[runners]]
  name = "global-gitlab-runner-bastion"
  url = "https://gitlab.com/"
  token = "..."
  executor = "docker+machine"
  [runners.custom_build_dir]
  [runners.docker]
    tls_verify = false
    image = "...private..."
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = true
    volumes = ["/cache", "/var/lib/docker:/var/lib/docker", "/var/run/docker.sock:/var/run/docker.sock", "/tmp:/tmp"]
    shm_size = 0
  [runners.cache]
    Type = "s3"
    Path = "docker-images"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "muse-gitlab-runner.nyc3.digitaloceanspaces.com"
      AccessKey = "..."
      SecretKey = ..."
      BucketName = "muse-gitlab-runner"
      BucketLocation = "nyc3"
  [runners.machine]
    IdleCount = 0
    IdleTime = 600
    MaxBuilds = 100
    MachineDriver = "digitalocean"
    MachineName = "gitlab-runner-autoscale-%s"
    MachineOptions = ["digitalocean-image=ubuntu-18-04-x64", "digitalocean-ssh-user=root", "digitalocean-access-token=....", "digitalocean-region=nyc3", "digitalocean-size=s-6vcpu-16gb", "digitalocean-private-networking", "digitalocean-tags=runner"]
    OffPeakPeriods = ["* * 0-9,18-23 * * mon-fri *", "* * * * * sat,sun *"]
    OffPeakTimezone = "America/Denver"
    OffPeakIdleCount = 0

Used GitLab Runner version

inline above

Possible fixes

None known.