Remove double concurrency window during deployment
Problem
When both blue
and green
deployments are active it means that we have 2 deployments of runner managers running. Having both of them run at the same time means the concurrent
is set to twice the capacity, so for example if our total concurrent is set to 6500
when we deploy green
we are going to end up with 13000
concurrent slots, until we disable blue
. This can be a problem when we have a large backlog of jobs and need to do a deployment to remediate the backlog or potential spike during the switch over.
Having double the capacity can lead to GCP quota limits or any other limit because we don't limit anything.
Proposal
Have a controlled and automated way of shifting load between the blue
and green
deployment. For example:
- When you first activate the
green
deployment, set theconcurrent
forgreen
to200
and reduce theblue
deployment by200
. - When you are confident with the deployment continue shifting the
concurrent
togreen
until we reached the desired 100% capacity ongreen
.
Ideally, the shifting/canary should be part of the normal deploy process and not something that we do only in emergencies because we want it to be "normal" to shift traffic and not run a process that we don't run very often during an incident when you want everything to work.