Concurrent doesn't limit the number of docker-machine VMs created
Summary
In #2611 (comment 801403416) I related a concern raised by https://gitlab.my.salesforce.com/00161000004zoBW; that concurrent wasn't behaving as documented.
It turns out, setting the concurrent global does limit the parallelism of overall job execution for a given runner-manager, but does not explicitly limit the number of VMs created by docker-machine.
Steps to reproduce
- Create sustained job demand at or above the value of
concurrent - Register multiple
[[runners]](example config.toml provided below. - Observe the number of docker+machine VMs created exceed the stipulated concurrent limit
Described in the attached videos.
- Video 1: Context & Setup Description
- Video 2: Watch a pipeline that creates sustained parallel job demand spawn more VMs than concurrent implies
config.toml
concurrent = 5
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "k2-dm"
url = "https://gitlab.jreid.dev"
token = "redacted"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.machine]
IdleCount = 1
IdleScaleFactor = 0.0
IdleCountMin = 0
MachineDriver = "virtualbox"
MachineName = "dm-as-1-%s"
[[runners]]
name = "k2-dm-2"
url = "https://gitlab.jreid.dev"
token = "redacted"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.machine]
IdleCount = 1
IdleScaleFactor = 0.0
IdleCountMin = 0
MachineDriver = "virtualbox"
MachineName = "dm-as-2-%s"
[[runners]]
name = "k2-dm-3"
url = "https://gitlab.jreid.dev"
token = "redacted"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.machine]
IdleCount = 1
IdleScaleFactor = 0.0
IdleCountMin = 0
MachineDriver = "virtualbox"
MachineName = "dm-as-3-%s"
.gitlab-ci.yml
stages:
- one
# - two
stage-1-job-1:
stage: one
script:
- echo "hello world"
- sleep 180
- echo "goodbye world"
stage-1-job-2:
stage: one
script:
- echo "hello again"
- sleep 180
- echo "seeya"
stage-1-job-3:
stage: one
script:
- echo "hello world"
- sleep 120
- echo "goodbye world"
stage-1-job-4:
stage: one
script:
- echo "hello again"
- sleep 120
- echo "seeya"
stage-1-job-5:
stage: one
script:
- echo "hello again"
- sleep 120
- echo "seeya"
stage-1-job-6:
stage: one
script:
- echo "hello again"
- sleep 120
- echo "seeya"
stage-1-job-7:
stage: one
script:
- echo "hello again"
- sleep 60
- echo "seeya"
stage-1-job-8:
stage: one
script:
- echo "hello again"
- sleep 60
- echo "seeya"
stage-1-job-9:
stage: one
script:
- echo "hello again"
- sleep 60
- echo "seeya"
stage-1-job-10:
stage: one
script:
- echo "hello again"
- sleep 60
- echo "seeya"
Actual behaviour
The docker+machine autoscaler creates additional VMs which aren't immediately handed jobs (as the number of concurrent jobs allowable has already been reached).
Expected behaviour
Up for debate. On one hand, it's almost convenient to have a "warmed up" VM ready to accept an additional job, and could be a job-wait-time reduction, particularly if the machine type is different from that of an about-to-finish job.
Example:
concurrent = 4
-
Active jobs
t2.mediums = 3t2.larges = 1 -
Pending jobs
t2.mediums = 0t2.larges = 2
Having up to three t2.large docker+machine VMs spun up, two of which will just be waiting to accept pending jobs for an indeterminate amount of time, will definitely save VM startup time for the pending jobs as the active jobs complete.
On the other hand, this is somewhat unexpected if you're relying upon concurrent to limit the total number of VMs active; for example, if you're managing a limited amount of IP Address space for a given region or subnet.
Used GitLab Runner version
arch=amd64 os=linux pid=141019 revision=5316d4ac version=14.6.0