"idle_count = 0" never scales in the executors to 0 if capacity_per_instance is greater than 1
Summary
I'm trying to create a setup where we have larger gitlab runner executors, that can handle more than one job, but I can't. The idea is to have two different policies:
[[runners.autoscaler.policy]]
periods = ["* 7-21 * * MON-FRI"]
idle_count = 5
idle_time = "30m0s"
scale_factor = 0.0
scale_factor_limit = 0
[[runners.autoscaler.policy]]
idle_count = 0
idle_time = "3m0s"
scale_factor = 0.0
scale_factor_limit = 0
But I realized that setup doesn't work. Removing the first one we still get 1 machine always on. If I run 10 jobs, the second machine is up and after a while is shutdown, but the other one persists.
If the capacity_per_instance = 1
and using gitlab-runner 16.11.0 it works. But changing this to 5 it doesn't with both versions.
Steps to reproduce
- Create a runner manager with the TOML shared below
- Run a simple hello world job
- The taskscaler will create a new instance, but the instance never dies
.gitlab-ci.yml
default:
image: alpine:latest
tags:
- Testing
stages:
- test
test_async:
stage: test
script:
- echo "Hello World"
parallel: 5
Actual behavior
The instance is never deleted. Actually it deletes the instance when a new job enters in the queue, but recreates a new one and it stays forever.
Expected behavior
After 3 minutes the instance is shutdown.
Relevant logs and/or screenshots
job log
Add the job log
Environment description
config.toml contents
concurrent = 10 check_interval = 15 log_level = "debug" connection_max_age = "15m0s" shutdown_timeout = 300
[session_server] session_timeout = 1800
[redacted] name = "Testing" url = "https://gitlab.com" id = 1234 limit = 10 request_concurrency = 10 token = "glrt-token" executor = "docker-autoscaler" [runners.custom_build_dir] [runners.cache] Type = "s3" Shared = true MaxUploadedArchiveSize = 0 [runners.cache.s3] BucketName = "my-bucket-123" BucketLocation = "eu-west-1" [runners.docker] tls_verify = false image = "busybox:latest" privileged = false services_privileged = true disable_entrypoint_overwrite = false oom_kill_disable = false disable_cache = false volumes = ["/cache"] allowed_privileged_services = ["docker:.*dind"] allowed_pull_policies = ["always", "if-not-present"] shm_size = 0 network_mtu = 0 [runners.autoscaler] capacity_per_instance = 5 max_use_count = 100 max_instances = 2 plugin = "fleeting-plugin-aws" [runners.autoscaler.plugin_config] name = "Autoscaling-group"
[[runners.autoscaler.policy]]
idle_count = 0
idle_time = "3m0s"
scale_factor = 0.0
scale_factor_limit = 0
Used GitLab Runner version
Version: 16.11.1
Git revision: 535ced5f
Git branch: 16-11-stable
GO version: go1.21.9
Built: 2024-05-03T15:52:38+0000
OS/Arch: linux/amd64
Also tested in 16.11.0
Possible fixes
Maybe related with #37526