Autoscale VMs based on a percentage of in-use VMs (!3179) · Merge requests · GitLab.org / gitlab-runner

Elliot Rushton requested to merge percent-based-autoscaling into main Oct 26, 2021

What does this MR do?

Adds an option to set an IdleScaleFactor which defines that the number of Idle VMs should be a factor of in-use VMs.

For example, for IdleScaleFactor=0.2 and with 200 in-use VMs, Runner should maintain 200 * 0.2 = 40 Idle VMs. At the same time all other rules of autoscaling are preserved, so the number of Idle VMs will not exceed the defined IdleCount, the total number of VMs will not exceed limit (if defined as more than 0) and the number of machines that are in creation will not exceed the MaxGrowthRate (if defined as more than 0).

To prevent from a deadlock (caused by low-level details of autoscaling algorithm implementation) and to give better control over the readiness for initial load handling, the IdleCountMin was introduced.

The MR also updates the documentation about autoscaling to describe this experimental feature.

Why was this MR needed?

To make the autoscaling more dynamic. Currently the IdleCount is a static number and in some cases it needs to be set to unreasonable big values to support rare moments when there is a spike of jobs in the queue. And this generates additional cost.

With the number of Idle VMs will be based on the number of in-use VMs, autoscaling should be more adjusted to the real load. Yet, with the Autoscaling Periods and option to not define IdleScaleFactor, users can define static values of Idle VMs for needed cases.

What's the best way to test this MR?

Register a runner built from this MR into a test project, create a pipeline with a lot of jobs, play with the parameters and observe how Runner is adjusting to the situation.

What are the relevant issue numbers?

Related to #28052 (closed)

Edited Nov 04, 2021 by Tomasz Maczukin

Autoscale VMs based on a percentage of in-use VMs

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports