Add timeouts to all docker-machine command executions
What does this MR do?
In gitlab-com/gl-infra/production-engineering#27396
we saw that docker-machine create hung for several days, causing
pipelines to be stuck because the maximum number of VMs were in the
creating state. While we don't know why the command hung, this commit
adds a timeout on every docker-machine command to prevent this from
happening.
The timeouts are very conservative with a 1-hour limit. The Stop timeout remains at 1 minute.
What's the best way to test this MR?
- Configure a TOML with
docker-machine:
concurrent = 1
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "test-docker-machine"
url = "https://gitlab.example.com"
id = 15
token = "glrt-redacted"
token_obtained_at = 2025-08-20T05:52:33Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker+machine"
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v18.2.0"
pull_policy = "if-not-present"
tls_verify = false
image = "ruby:3.3"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
network_mtu = 0
[runners.machine]
IdleCount = 0
IdleScaleFactor = 0.0
IdleCountMin = 0
MachineDriver = "google"
MachineName = "auto-scale-%s"
MachineOptions = [
"google-project=YOUR-GOOGLE-PROJECT",
"google-disk-size=10",
"google-disk-type=pd-ssd",
"google-machine-type=n2-standard-2",
"google-maintenance-policy=TERMINATE",
"google-machine-image=ubuntu-os-cloud/global/images/ubuntu-2204-jammy-v20250815"
]
- Obtain Google service credentials and save it as
google-creds.json. - Run the runner:
GOOGLE_APPLICATION_CREDENTIALS=google-creds.json gitlab-runner run -c docker-machine.toml
- Run a CI job.
Edited by Stan Hu