Draft: Implement scale debouncing
Problem
Scaling up and down is controlled by CapacityInfo
and the RequiredInstances()
function.
CapacityInfo
has an InstanceCount
variable that is used within the RequiredInstances()
calculation, which is comprised of "active" instances (state: creating, running) and instances that have been requested.
It doesn't include instances that are being removed. This is because we don't want to treat instances that are being removed as "active", otherwise we'd block the creation of new instances just because other instances are on their way out.
However, this leads to a problem within the calculation under the following scenarios:
- The fleeting plugin hints that an instance is being deleted so that jobs are not scheduled to it. This is a valid behaviour, because taskscaler won't actually delete an instance if it has active jobs.
- The fleeting plugin reports "Deleting" because the scaling group has begun deletion of an instance that we did not request.
When this occurs, because state deleting
isn't treated as an active instance, we'll immediately scale up to accomodate reservations that are already backed by an active acquisition.
Solution
We add a simple "debounce" implementation that provides a count of instances that are both being deleted AND have existing acquisitions. This is added to the instance count.
In addition, we further strengthen the behaviour of the state deleting
being used as a hint to say "please don't schedule any more to me" by populating the unavailable
capacity. This will allow scale events to occur early when they are actually required.
I've called the variable Debouncing
, which is a little odd, because the calculation values are usually named after what they represent, not the behaviour they'll exhibit. But in this case, this felt like a more concise name than RemovedInstancesWithAcquisitions
, and I think debouncing
is maybe a good term for our ubiquitous language. But I'm open to suggestions!