Trigger separate deployments for each environment

Proposal

Release tools should trigger each environment deployment as a distinct job.

To roll out this change safely we will continue triggering deployer on master branch with the old multi-env deployment and in parallel, we will trigger the new deployments to the next-gen branch that defaults into check_mode.

Here follow a brief diagram of the new pipeline:

stateDiagram-v2
   [*] --> tag
    state fork_state <<fork>>
    tag --> fork_state
    fork_state --> wait_cng
    fork_state --> wait_omnibus

    state join_state <<join>>
    wait_cng --> join_state
    wait_omnibus --> join_state

    join_state --> trigger_gstg

    join_state --> trigger_complete_deployment
    note right of trigger_complete_deployment : already implemented up to here

    trigger_gstg --> trigger_cny

    state fork_state_cny <<fork>>
    trigger_cny --> fork_state_cny
    fork_state_cny --> baking_time
    fork_state_cny --> manual_promotion
    state join_state_cny <<join>>
    baking_time --> join_state_cny
    manual_promotion --> join_state_cny
    note right of baking_time : no longer a trigger
    note right of manual_promotion : no longer a trigger

    join_state_cny --> trigger_gprd

    trigger_gprd --> [*]

Baking time and manual promotions will be moved to release-tools as well (both triggers release-tools from deployer) as they are only available during a deployer multi-env pipeline.

We should evaluate if we can make use of multi-project pipeline status mirroring instead of active waiting.

Exit criteria

Individual deployments for staging, canary and production should be triggered by release tools

To do

Evaluate whether multi-project pipeline status mirroring can be used instead of active waiting
- Since the next stage jobs are skipped if the child jobs are retried, active waiting will be used in this iteration, see #1578 (comment 520398890) for more details
Trigger a separate deployment for gstg
Trigger a separate deployment for cny
Baking time moved to release-tools
Trigger a separate deployment for production
Add a manual build to trigger production checks and then trigger a production deployment.
Testing
Create follow-ups

Implementation steps

Extract deployer logic so it can be re-used across multiple classes - gitlab-org/release-tools!1385 (merged)
Trigger a deployment to staging and canary using the next-gen branch, and DEPLOY_ENVIRONMENT variable gitlab-org/release-tools!1386 (merged)
Move baking time to release tools gitlab-org/release-tools!1398 (merged)
Trigger an individual deployment to gprd gitlab-org/release-tools!1418 (merged)
Add a manual build to promote to prod: This one should run production checks, log the results in the monthly issue and then trigger an individual deployment to gprd gitlab-org/release-tools!1420 (merged)

Testing

Process

Prepare strategy
- Currently, TRIGGER_REF env variable is used to indicate which deployer branch we use to perform deployments.
- master is used for legacy deployments (gprd, gprd-cny, gstg) and next-gen is used for individual deployments
- For testing we need to switch the deployer branch: master will be used for individual deployments and next-gen will be used for legacy deployments
- MR with the implementation gitlab-org/release-tools!1423 (merged)
Cherry-pick a commit in the auto-deploy branch or create a new auto-deploy branch. Commit cherry-picked d5857130d59f3d7751d9c6f9471107148bab53d5
Add INDIVIDUAL_DEPLOYMENTS environment to release-tools
Ensure release-tools pipeline is generated correctly
Ensure the individual deployments to production are executed correctly.

Notes

First round of testing went partially successful. We missed some notifications due to a missing environment variable
Second round of testing was successful #1578 (comment 556457618). Minor follow ups to be created:
- Rename Pipeline: deployer to Pipeline: Release Tools
- Ignore gprd-checks when triggering an individual deployment
- Remove INDIVIDUAL_DEPLOYMENTS environment variable

What to do if something goes wrong?

Remove INDIVIDUAL_DEPLOYMENTS environment variable from release-tools ttps://ops.gitlab.net/gitlab-org/release/tools/-/settings/ci_cd. Subsequent packages will use the legacy deployer pipeline, or,
Trigger an individual deployment manually is also an option, e.g /chatops run deploy <package> --production

Development log

March 4th, 2021 - Call to discuss this issue https://youtu.be/xI2MHSt5Y0Y. Summarized:
- Multi-project pipeline status mirroring has a bug preventing us from using this feature. To unblock this issue we'll use active waiting strategy
- Purpose of &154 is to have a single pipeline that coordinates release pipelines. For this issue, we're going to trigger individual deployments pipelines for each environment.
- As a safety net, we will trigger the new deployments strategy in parallel to the current multi-env deployment
- Waiting time for staging and canary can use a similar strategy to Omnibus and CNG waiting time: A delayed pipeline (time will depend on the environment) and active polling
- Additional jobs (QA, slack notifications, etc) will remain in deployer, they can be moved in a future iteration.
March 7th, 2021 - Deployer logic extracted into a module gitlab-org/release-tools!1385 (merged)
March 9th, 2021 - Logic to trigger individual deployments to staging and canary implemented on gitlab-org/release-tools!1386 (merged)
March 10th, 2021 - gitlab-org/release-tools!1386 (merged) MR sent to review.
March 17th, 2021
- gitlab-org/release-tools!1386 (merged) was merged
- A typo was noticed on the environment variables used by the auto_deploy:deploy task gitlab-org/release-tools!1395 (merged)
March 18th, 2021
- A release tool pipeline trigger a deployment to all our environments and individual deployments in check mode https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/518798
- A merge request adding baking-time job to release-tools was sent to review gitlab-org/release-tools!1398 (comment 532751714)
April 2nd, 2021
- gitlab-org/release-tools!1386 (merged) was merged
- An error was reported on auto_deploy:wait:cny - #1578 (comment 543877376)
- MR was submitted to fix the error gitlab-org/release-tools!1416 (merged)
- auto_deploy:wait:cny is working again https://ops.gitlab.net/gitlab-org/release/tools/-/jobs/3529393 (this one failed since the deployer pipeline also failed https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/jobs/3530699).
April 5th, 2021
- https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/543575 release tools pipeline with baking time 🎉
- auto_deploy:baking_time failed silently with 2021-04-05 12:52:29.683916 E [dry-run] ReleaseTools::Promotion::Checks::GitlabDeploymentHealth -- Cannot detect gitlab deployment health -- {:error=>#<HTTP::ConnectionError: failed to connect: Operation timed out - connect(2) for "thanos-query-frontend-internal.ops.gke.gitlab.net" port 9090>}
- MR to fix the above failure gitlab-org/release-tools!1417 (merged)
- MR to trigger an individual deployment to prod was submitted gitlab-org/release-tools!1418 (merged)
April 6th, 2021
- MR to trigger individual deployments to prod was merged
- MR to manually trigger production checks and then individual production deployment was submitted
April 7th, 2021
- MR to manually trigger production checks and then individual production deployment was merged
April 8th, 2021
- Release tools pipeline successfully triggered an individual deployment to prod https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/548947 / https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/pipelines/549738
April 9th, 2021
- MR that prepares a test strategy was submitted gitlab-org/release-tools!1423 (merged)
April 19th, 2021
- gitlab-org/release-tools!1423 (merged) was merged
- First round of testing started #1578 (comment 555309879). There was a bug related to Slack notifications and messages being posted on the monthly issue. Should be fixed by gitlab-org/release-tools!1428 (merged)
April 20th, 2021
- Second round of testing was successfully completed #1578 (comment 556457618)

Follow ups

Update links to point out to release-tools instead of deployer #1687 (closed)
Ignore gprd-checks when the deployment is triggered from the coordinator-pipeline - #1688 (closed)
Remove legacy deployment - #1689 (closed)
Remove test environment variable - #1690 (closed)

Edited Apr 21, 2021 by Mayra Cabrera