Allow different scalability properties for a runway project deployed to different regions
We currently allow some properties concerning scalability to be configured, for example, in AI-gateway we have:
scalability:
min_instances: 4
max_instances: 200
max_instance_request_concurrency: 40
I think the min_instances
there is probably over-provisioned even in our largest region, so I think it would make sense to have this configurable per-region, so the regions that are quiet can have fewer idling instances.
Alternatively, we could look into having sensible defaults and appropriate scaling properties that would trigger a scale-up so those could be applied globally. Then we should have appropriate documentation and tools to help service owners set these properties correctly.