Arran Walker requested to merge ajwalker/fix into main Feb 28, 2024

Problem

Scaling up and down is controlled by CapacityInfo and the RequiredInstances() function.

CapacityInfo has an InstanceCount variable that is used within the RequiredInstances() calculation, which is comprised of "active" instances (state: creating, running) and instances that have been requested.

It doesn't include instances that are being removed. This is because we don't want to treat instances that are being removed as "active", otherwise we'd block the creation of new instances just because other instances are on their way out.

However, this leads to a problem within the calculation under the following scenarios:

The fleeting plugin hints that an instance is being deleted so that jobs are not scheduled to it. This is a valid behaviour, because taskscaler won't actually delete an instance if it has active jobs.
The fleeting plugin reports "Deleting" because the scaling group has begun deletion of an instance that we did not request.

When this occurs, because state deleting isn't treated as an active instance, we'll immediately scale up to accomodate reservations that are already backed by an active acquisition.

Solution

We add a simple "debounce" implementation that provides a count of instances that are both being deleted AND have existing acquisitions. This is added to the instance count.

In addition, we further strengthen the behaviour of the state deleting being used as a hint to say "please don't schedule any more to me" by populating the unavailable capacity. This will allow scale events to occur early when they are actually required.

I've called the variable Debouncing, which is a little odd, because the calculation values are usually named after what they represent, not the behaviour they'll exhibit. But in this case, this felt like a more concise name than RemovedInstancesWithAcquisitions, and I think debouncing is maybe a good term for our ubiquitous language. But I'm open to suggestions!

Draft: Implement scale debouncing

Problem

Solution

Merge request reports