Make Container Registry changes rollout with a Canary
Corrective action suggested from production#14263 (closed)
One of the issues we have with Registry deployments is that there is not a coordinated deploy pipeline, version bumps are done in MRs. What we do typically is an MR for non-prod and an MR for prod, e.g.
- Pre and Staging: Bump Container Registry to v3.... (gitlab-com/gl-infra/k8s-workloads/gitlab-com!2714 - merged)
- Prod: Bump Container Registry to v3.73.0-gitlab (gitlab-com/gl-infra/k8s-workloads/gitlab-com!2715 - merged)
Then, it's up to someone in Infra to merge these in a way that ensures that we wait long enough on Staging. There are a couple problems with this:
- Nothing enforces how long we wait between non-prod and prod
- There is no QA between non-prod and prod, and non-prod doesn't get much traffic
- Canary is grouped with production, I think it's likely that we might have noticed this issue in Canary if we deployed to it separately.
Canary is grouped with production because we don't set an explicit
registry_version
for it here https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/b3ed0aec9337b8ffd99592db97dba52020adbf15/bases/gprd.yaml#L23 .
Let's use this issue to make sure that all Registry deployments go through a Canary stage, and stay there for at least 30 minutes for baking.