Add timeouts to all docker-machine command executions

What does this MR do?

In gitlab-com/gl-infra/production-engineering#27396 we saw that docker-machine create hung for several days, causing pipelines to be stuck because the maximum number of VMs were in the creating state. While we don't know why the command hung, this commit adds a timeout on every docker-machine command to prevent this from happening.

The timeouts are very conservative with a 1-hour limit. The Stop timeout remains at 1 minute.

What's the best way to test this MR?

  1. Configure a TOML with docker-machine:
concurrent = 1
check_interval = 0
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "test-docker-machine"
  url = "https://gitlab.example.com"
  id = 15
  token = "glrt-redacted"
  token_obtained_at = 2025-08-20T05:52:33Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker+machine"
  [runners.cache]
    MaxUploadedArchiveSize = 0
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v18.2.0"
    pull_policy = "if-not-present"
    tls_verify = false
    image = "ruby:3.3"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    network_mtu = 0
  [runners.machine]
    IdleCount = 0
    IdleScaleFactor = 0.0
    IdleCountMin = 0
    MachineDriver = "google"
    MachineName = "auto-scale-%s"
    MachineOptions = [
      "google-project=YOUR-GOOGLE-PROJECT",
      "google-disk-size=10",
      "google-disk-type=pd-ssd",
      "google-machine-type=n2-standard-2",
      "google-maintenance-policy=TERMINATE",
      "google-machine-image=ubuntu-os-cloud/global/images/ubuntu-2204-jammy-v20250815"
    ]
  1. Obtain Google service credentials and save it as google-creds.json.
  2. Run the runner:
GOOGLE_APPLICATION_CREDENTIALS=google-creds.json gitlab-runner run -c docker-machine.toml
  1. Run a CI job.
Edited by Stan Hu

Merge request reports

Loading