"idle_count = 0" never scales in the executors to 0 if capacity_per_instance is greater than 1

Summary

I'm trying to create a setup where we have larger gitlab runner executors, that can handle more than one job, but I can't. The idea is to have two different policies:

    [[runners.autoscaler.policy]]
      periods = ["* 7-21 * * MON-FRI"]
      idle_count = 5
      idle_time = "30m0s"
      scale_factor = 0.0
      scale_factor_limit = 0

    [[runners.autoscaler.policy]]
      idle_count = 0
      idle_time = "3m0s"
      scale_factor = 0.0
      scale_factor_limit = 0

But I realized that setup doesn't work. Removing the first one we still get 1 machine always on. If I run 10 jobs, the second machine is up and after a while is shutdown, but the other one persists.

If the capacity_per_instance = 1 and using gitlab-runner 16.11.0 it works. But changing this to 5 it doesn't with both versions.

Steps to reproduce

Create a runner manager with the TOML shared below
Run a simple hello world job
The taskscaler will create a new instance, but the instance never dies

.gitlab-ci.yml

default:
  image: alpine:latest
  tags:
    - Testing

stages:
  - test

test_async:
  stage: test
  script:
    - echo "Hello World"
  parallel: 5

Actual behavior

The instance is never deleted. Actually it deletes the instance when a new job enters in the queue, but recreates a new one and it stays forever.

Expected behavior

After 3 minutes the instance is shutdown.

Relevant logs and/or screenshots

job log

Add the job log

Environment description

config.toml contents

concurrent = 10 check_interval = 15 log_level = "debug" connection_max_age = "15m0s" shutdown_timeout = 300

[session_server] session_timeout = 1800

[redacted] name = "Testing" url = "https://gitlab.com" id = 1234 limit = 10 request_concurrency = 10 token = "glrt-token" executor = "docker-autoscaler" [runners.custom_build_dir] [runners.cache] Type = "s3" Shared = true MaxUploadedArchiveSize = 0 [runners.cache.s3] BucketName = "my-bucket-123" BucketLocation = "eu-west-1" [runners.docker] tls_verify = false image = "busybox:latest" privileged = false services_privileged = true disable_entrypoint_overwrite = false oom_kill_disable = false disable_cache = false volumes = ["/cache"] allowed_privileged_services = ["docker:.*dind"] allowed_pull_policies = ["always", "if-not-present"] shm_size = 0 network_mtu = 0 [runners.autoscaler] capacity_per_instance = 5 max_use_count = 100 max_instances = 2 plugin = "fleeting-plugin-aws" [runners.autoscaler.plugin_config] name = "Autoscaling-group"

[[runners.autoscaler.policy]]
  idle_count = 0
  idle_time = "3m0s"
  scale_factor = 0.0
  scale_factor_limit = 0

Used GitLab Runner version

Version:      16.11.1
Git revision: 535ced5f
Git branch:   16-11-stable
GO version:   go1.21.9
Built:        2024-05-03T15:52:38+0000
OS/Arch:      linux/amd64

Also tested in 16.11.0

Possible fixes

Maybe related with #37526

Edited May 15, 2024 by Alisson Ramos de Oliveira