Make taskscaler reservation throttling configurable

Problem

Taskscaler supports "reserving" capacity that won't be removed until unreserved or used. This becomes problematic under specific conditions:

  • Long-polling is enabled on job requests
  • Job request concurrency is greater than 1
  • The Runner is fairly inactive

In this scenario, you might have 10 active requests long-polling for jobs, but nothing arriving. This creates 10 reservations, and we don't want to remove the underlying capacity in case a job arrives for any of those requests.

Customers with idle capacity rules don't see capacity being removed after idle_time is exceeded, because we're holding reservations for potential jobs. To address this, we added reservation throttling: limiting the number of reservations so we could remove at least one idle instance. However, the calculation we perform has proven to be more of a hindrance than a help.

While we could improve when reservation throttling is enabled, we've already solved the root cause on the Runner side via adaptive request concurrency. Previously, if a Runner's request_concurrency was set to 10, it would always maintain 10 concurrent requests. Now, that value acts as a maximum limit, and we scale request concurrency up and down based on job arrival rate. This change naturally provides Taskscaler with opportunities to remove idle instances.

Plan

  1. Make reservation throttling configurable
  2. Enable it by default in Runner
  3. Later, once we're confident it isn't needed, disable it by default in Runner.

If we encounter situations where adaptive request concurrency isn't sufficient, customers can opt in to reservation throttling while we explore better detection mechanisms.

Edited by Arran Walker