Skip to content

Remove double concurrency window during deployment

Problem

When both blue and green deployments are active it means that we have 2 deployments of runner managers running. Having both of them run at the same time means the concurrent is set to twice the capacity, so for example if our total concurrent is set to 6500 when we deploy green we are going to end up with 13000 concurrent slots, until we disable blue. This can be a problem when we have a large backlog of jobs and need to do a deployment to remediate the backlog or potential spike during the switch over.

Having double the capacity can lead to GCP quota limits or any other limit because we don't limit anything.

Proposal

Have a controlled and automated way of shifting load between the blue and green deployment. For example:

  1. When you first activate the green deployment, set the concurrent for green to 200 and reduce the blue deployment by 200.
  2. When you are confident with the deployment continue shifting the concurrent to green until we reached the desired 100% capacity on green.

Ideally, the shifting/canary should be part of the normal deploy process and not something that we do only in emergencies because we want it to be "normal" to shift traffic and not run a process that we don't run very often during an incident when you want everything to work.

Edited by Steve Xuereb