Skip to content

Allow different scalability properties for a runway project deployed to different regions

We currently allow some properties concerning scalability to be configured, for example, in AI-gateway we have:

  scalability:
    min_instances: 4
    max_instances: 200
    max_instance_request_concurrency: 40

I think the min_instances there is probably over-provisioned even in our largest region, so I think it would make sense to have this configurable per-region, so the regions that are quiet can have fewer idling instances.

Alternatively, we could look into having sensible defaults and appropriate scaling properties that would trigger a scale-up so those could be applied globally. Then we should have appropriate documentation and tools to help service owners set these properties correctly.