Plan for environments and rollouts for all of infrastructure
Overview
When doing changes in production ideally we should test these before we apply them in production especially if they are potentially destructive. For most of the changes we have rings however this doesn't cover all the types of changes below.
Types of Changes
- Application Changes, for example; Deploying new version of
gitlab-org/gitlab
- Instrument changes, for example; Change snapshot policy from 4h to 1h
- Cell Services, for example; configured on the HTTP router
- AMP Changes, for example; upgrading the Kubernetes version
- Deployment Engine, for example; Implement rollbacks in rings
Problems to solve
- Partition user base for gradual rollouts
- Manage Jitter (Many changes, frequent changes, potential instability)
- Validate Packages for Self Managed Users
- Unify the tooling for Dedicated, Cells, and GitLab
- Testing integration with Legacy Infra
- Not adding overhead/context overload for SREs to do something in infrastructure.
Action Items
-
Walk through an example with Cells on how deployment works -
Walk through an example of a configuration change -
Walk through an example of an emergency fix -
Walk through an example a deployment of Cell Service -
Walk through an example of a AMP changes -
Document each type of change and the solution for that change
Edited by Alessio Caiazza (OOO until 2024-08-26)