Fix idle scale down by preventing reservations that exceed idle capacity
By design, reservations for capacity won't allow instances to be scaled down. But what wasn't taken into consideration was that a high number of reservations would effectively block scaling down. This is happening for high values of `request_concurrency in Runner.
This MR ensures that when an idle capacity does exist, we don't allow reservations to exceed the idle capacity minus capacity_per_instance. This still allows for fast scaling up, but gradually uses the existing idle capacity, and the reduction in reservations allows the scheduler time to remove instances that are not really used.
Edited by Arran Walker