Announcement: GitLab.com deployment pipeline re-order to support mixed-deployment testing
What’s changing?
The auto-deploy process for rolling out code changes to GitLab.com is changing. Previously new packages were deployed to staging, then production-canary, then production, with QA suites run against each environment after deployment. With this change, staging now has a new staging-canary stage, which is the first environment to be deployed to. After this, code goes directly to production-canary, and only after baking there for 30 minutes is the code deployed to staging and production almost simultaneously.
Why are we changing the order?
Our production deployments pass through a production-canary stage before being deployed to production but we don't currently have the same deployment setup in our staging environment. By re-ordering the deployment pipeline to really make use of our new staging-canary environment we can test for problems caused by multiple versions of Gitlab running in the same environment.
As an engineer, how does this impact me?
Staging-canary will replace Staging as the place to enable feature flags and to test any other changes that you previously tested on Staging.
For a detailed overview of the canary stage of the staging environment and how to access it, please read over the dedicated canary documentation. Otherwise there is no change to deployment velocity, nor to mean time to production. This change is purely to the order in which deployments take place
Because the order of the pipeline has changed, this also means the labels and comments you see on your issues and MRs from release-tools automation well be different than what was before, in order to match the new order in which things are deployed to (but these should match the order you see in the diagram above).
How does this impact QA tests?
This does not impact QA tests beyond the expansion in their scope (staging-canary has an additional test suite to provide mixed-deployment testing). Test suites still run at the same time relative to when the deployment for an environment takes place
Where can I find more in-depth information about this change and the reasons behind it?
- This epic details the need to improve staging
- This epic is what is being used to track all the Infra work related to this change.
- This issue details the new pipeline state, and why we choose to go with that solution (and reasons behind it)
- This issue contains the rollout and testing plan
Where can I ask questions or provide feedback?
Please comment on this issue directly with your questions and feedback and we can provide answers there. If it is urgent please reach out to #g_delivery in slack