Fleet management v2.0
In order to meet the coming demand and scale for GitLab.com, as well as to provide a truly flexible and scalable architecture, we are changing the way we deploy and build our infrastructure.
You are encouraged to read (and contribute to) the detailed plan as this meta-issue is just the actionable plan and assumes the knowledge of some definitions in it.
Bah. I don't have the time to read all that. What are we going to do?
- Clearly define classes of machines (cattle, pets, snowflakes)
- Leverage build and deployment tools to make all 'cattle' machines ephemeral
- Move all Secret Storage into a solution agnostic provider
Taking these steps allows us to being early stage usage of 'canary' deployments as well as align us with a smooth transition into Kubernetes containerisation and deployment.
We accomplish this by using packer and chef-solo for all our cattle. Everything, including pets and snowflakes, will be managed by terraform. All of our secrets will be stored in Vault, which will allow us to open source our chef-repo entirely.
The use of tagged, unambiguous images developed through this pipeline will enable us to do canary deployments and quickly promote code from testing, to stage, and on into production in a continuous manner.
So what do we need to do?
- Enable our chef repository to build vm and container images - https://gitlab.com/gitlab-com/infrastructure/issues/1211
- Use service discovery to configure the fleet in realtime instead of using chef-client - https://gitlab.com/gitlab-com/infrastructure/issues/1214
- Move our secrets from chef-vault to an agnostic provider solution - https://gitlab.com/gitlab-com/infrastructure/issues/1212
-
Automate the lifecycle of staging environments -
https://gitlab.com/gitlab-com/infrastructure/issues/1040 - Automate db migration in a safe, isolated environment - https://gitlab.com/gitlab-com/infrastructure/issues/1215
- Deploy feature branches in staging using review apps - https://gitlab.com/gitlab-com/infrastructure/issues/1035
- Automate the production lifecycle - https://gitlab.com/gitlab-com/infrastructure/issues/1216