Introduce a new production/staging stage "onebox" that consists of one box of every single fleet type
This is a fairly common deploy strategy and something I think we should consider for GitLab.com. It would mean not adding any additional infrastructure but rather creating a new deployment stage called "onebox" in addition to the "canary" stage.
As we construct a pipeline on the using omnibus this will help with stability and easier rollbacks in lieu of proper blue/green deployments.
Why?
- How is this different than canary?
As we start deploying more frequently we need are going to want to see issues earlier to allow for fast rollbacks in case of issues. The goal would be to deploy to a single box after canary to get an early look into the health of the release.
- Why don't we just expand canary so it includes sidekiq, gitaly, etc?
Canary is designed so that it can be shut off completely from production traffic. Until we have sidekiq namespacing or a way to run a canary gitaly this is not possible.
- Does this change anything with migrations?
No, migrations will continue to run on the production cny stage
Deployment pipeline
- NEW ---> staging onebox:
[ gitaly single server ]
- NEW ---> staging onebox:
[ api x1 ] [ git x1 ] [ web x1 ] [ sidekiq x1 ] [ registry x1 ] [ mailroom x1 ] [ web-pages x1]
- staging:
[ gitaly ]
- staging:
[ api ] [ git ] [ web ] [ sidekiq ] [ registry] [ mailroom ] [ web-pages ]
- production cny:
[ api ] [ git ] [ web ]
- NEW ---> production onebox:
[ gitaly single server, isolate internal repos? ]
- NEW ---> production onebox:
[ api x1 ] [ git x1 ] [ web x1 ] [ sidekiq x1 ] [ registry x1 ] [ mailroom x1 ] [ web-pages x1]
- production:
[ gitaly ]
- production:
[ api ] [ git ] [ web ] [ sidekiq ] [ registry] [ mailroom ] [ web-pages ]
Additional Considerations
- We need to figure out how to target them properly for deployments and adding prometheus labels
- We need to be able to differentiate them in sentry. https://gitlab.com/gitlab-org/gitlab-ce/issues/50892
- We need to be able to differentiate them in logging, probably something other than hostname
- We may want to take on changing the way we handle inventory and discovery, switching to consul.