Trigger separate deployments for each environment
Proposal
Release tools should trigger each environment deployment as a distinct job.
To roll out this change safely we will continue triggering deployer on master branch with the old multi-env deployment and in parallel, we will trigger the new deployments to the next-gen branch that defaults into check_mode.
Here follow a brief diagram of the new pipeline:
stateDiagram-v2
[*] --> tag
state fork_state <<fork>>
tag --> fork_state
fork_state --> wait_cng
fork_state --> wait_omnibus
state join_state <<join>>
wait_cng --> join_state
wait_omnibus --> join_state
join_state --> trigger_gstg
join_state --> trigger_complete_deployment
note right of trigger_complete_deployment : already implemented up to here
trigger_gstg --> trigger_cny
state fork_state_cny <<fork>>
trigger_cny --> fork_state_cny
fork_state_cny --> baking_time
fork_state_cny --> manual_promotion
state join_state_cny <<join>>
baking_time --> join_state_cny
manual_promotion --> join_state_cny
note right of baking_time : no longer a trigger
note right of manual_promotion : no longer a trigger
join_state_cny --> trigger_gprd
trigger_gprd --> [*]
Baking time and manual promotions will be moved to release-tools as well (both triggers release-tools from deployer) as they are only available during a deployer multi-env pipeline.
We should evaluate if we can make use of multi-project pipeline status mirroring instead of active waiting.
Exit criteria
Individual deployments for staging, canary and production should be triggered by release tools
To do
-
Evaluate whether multi-project pipeline status mirroring can be used instead of active waiting - Since the next stage jobs are skipped if the child jobs are retried, active waiting will be used in this iteration, see #1578 (comment 520398890) for more details
-
Trigger a separate deployment for gstg -
Trigger a separate deployment for cny -
Baking time moved to release-tools -
Trigger a separate deployment for production -
Add a manual build to trigger production checks and then trigger a production deployment. -
Testing -
Create follow-ups
Implementation steps
-
Extract deployer logic so it can be re-used across multiple classes - gitlab-org/release-tools!1385 (merged) -
Trigger a deployment to staging and canary using the next-genbranch, andDEPLOY_ENVIRONMENTvariable gitlab-org/release-tools!1386 (merged) -
Move baking time to release tools gitlab-org/release-tools!1398 (merged) -
Trigger an individual deployment to gprdgitlab-org/release-tools!1418 (merged) -
Add a manual build to promote to prod: This one should run production checks, log the results in the monthly issue and then trigger an individual deployment to gprdgitlab-org/release-tools!1420 (merged)
Testing
Process
-
Prepare strategy - Currently,
TRIGGER_REFenv variable is used to indicate which deployer branch we use to perform deployments. -
masteris used for legacy deployments (gprd, gprd-cny, gstg) andnext-genis used for individual deployments - For testing we need to switch the deployer branch:
masterwill be used for individual deployments andnext-genwill be used for legacy deployments - MR with the implementation gitlab-org/release-tools!1423 (merged)
- Currently,
-
Cherry-pick a commit in the auto-deploy branch or create a new auto-deploy branch. Commit cherry-picked d5857130d59f3d7751d9c6f9471107148bab53d5 -
Add INDIVIDUAL_DEPLOYMENTSenvironment to release-tools -
Ensure release-tools pipeline is generated correctly -
Ensure the individual deployments to production are executed correctly.
Notes
- First round of testing went partially successful. We missed some notifications due to a missing environment variable
- Second round of testing was successful #1578 (comment 556457618). Minor follow ups to be created:
- Rename
Pipeline: deployertoPipeline: Release Tools - Ignore
gprd-checkswhen triggering an individual deployment - Remove
INDIVIDUAL_DEPLOYMENTSenvironment variable
- Rename
What to do if something goes wrong?
- Remove
INDIVIDUAL_DEPLOYMENTSenvironment variable from release-tools ttps://ops.gitlab.net/gitlab-org/release/tools/-/settings/ci_cd. Subsequent packages will use the legacy deployer pipeline, or, - Trigger an individual deployment manually is also an option, e.g
/chatops run deploy <package> --production
Development log
-
March 4th, 2021 - Call to discuss this issue https://youtu.be/xI2MHSt5Y0Y. Summarized:
- Multi-project pipeline status mirroring has a bug preventing us from using this feature. To unblock this issue we'll use active waiting strategy
- Purpose of &154 is to have a single pipeline that coordinates release pipelines. For this issue, we're going to trigger individual deployments pipelines for each environment.
- As a safety net, we will trigger the new deployments strategy in parallel to the current multi-env deployment
- Waiting time for staging and canary can use a similar strategy to Omnibus and CNG waiting time: A delayed pipeline (time will depend on the environment) and active polling
- Additional jobs (QA, slack notifications, etc) will remain in deployer, they can be moved in a future iteration.
- March 7th, 2021 - Deployer logic extracted into a module gitlab-org/release-tools!1385 (merged)
- March 9th, 2021 - Logic to trigger individual deployments to staging and canary implemented on gitlab-org/release-tools!1386 (merged)
- March 10th, 2021 - gitlab-org/release-tools!1386 (merged) MR sent to review.
-
March 17th, 2021
- gitlab-org/release-tools!1386 (merged) was merged
- A typo was noticed on the environment variables used by the
auto_deploy:deploytask gitlab-org/release-tools!1395 (merged)
-
March 18th, 2021
- A release tool pipeline trigger a deployment to all our environments and individual deployments in check mode https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/518798
- A merge request adding baking-time job to release-tools was sent to review gitlab-org/release-tools!1398 (comment 532751714)
-
April 2nd, 2021
- gitlab-org/release-tools!1386 (merged) was merged
- An error was reported on
auto_deploy:wait:cny- #1578 (comment 543877376) - MR was submitted to fix the error gitlab-org/release-tools!1416 (merged)
-
auto_deploy:wait:cnyis working again https://ops.gitlab.net/gitlab-org/release/tools/-/jobs/3529393 (this one failed since the deployer pipeline also failed https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/jobs/3530699).
-
April 5th, 2021
-
https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/543575 release tools pipeline with baking time
🎉 -
auto_deploy:baking_timefailed silently with2021-04-05 12:52:29.683916 E [dry-run] ReleaseTools::Promotion::Checks::GitlabDeploymentHealth -- Cannot detect gitlab deployment health -- {:error=>#<HTTP::ConnectionError: failed to connect: Operation timed out - connect(2) for "thanos-query-frontend-internal.ops.gke.gitlab.net" port 9090>} - MR to fix the above failure gitlab-org/release-tools!1417 (merged)
- MR to trigger an individual deployment to prod was submitted gitlab-org/release-tools!1418 (merged)
-
https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/543575 release tools pipeline with baking time
- April 6th, 2021
- April 7th, 2021
-
April 8th, 2021
- Release tools pipeline successfully triggered an individual deployment to prod https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/548947 / https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/pipelines/549738
-
April 9th, 2021
- MR that prepares a test strategy was submitted gitlab-org/release-tools!1423 (merged)
-
April 19th, 2021
- gitlab-org/release-tools!1423 (merged) was merged
- First round of testing started #1578 (comment 555309879). There was a bug related to Slack notifications and messages being posted on the monthly issue. Should be fixed by gitlab-org/release-tools!1428 (merged)
-
April 20th, 2021
- Second round of testing was successfully completed #1578 (comment 556457618)
Follow ups
- Update links to point out to release-tools instead of deployer #1687 (closed)
- Ignore
gprd-checkswhen the deployment is triggered from the coordinator-pipeline - #1688 (closed) - Remove legacy deployment - #1689 (closed)
- Remove test environment variable - #1690 (closed)
Edited by Mayra Cabrera