[Feature flag] Rollout of `ci_build_finished_worker_namespace_changed`
Summary
This issue is to rollout moving workers under CI namespace on production,
that is currently behind the ci_build_finished_worker_namespace_changed
feature flag.
There is some additional information on why this flag is introduced !64934 (comment 614945044)
Feature Issue #329450 (closed)
Owners
- Team: Verify:Pipeline Execution
- Most appropriate slack channel to reach out to:
#g_pipeline-execution
- Best individual to reach out to: @ck3g @allison.browne
- PM: @jreporter
The Rollout Plan
- Partial Rollout on GitLab.com with testing groups
- Rollout on GitLab.com for a certain period (How long)
- Percentage Rollout on GitLab.com
- Rollout Feature for everyone as soon as it's ready
Testing Groups/Projects/Users
-
gitlab-org/gitlab
project -
allison.browne/ci-hello-world
project
Expectations
What are we expecting to happen?
We will stop calling BuildFinishedWorker
(which currently inherits from Ci::BuildFinishedWorker
) and call Ci::BuildFinishedWorker
What might happen if this goes wrong?
What can we monitor to detect problems with this?
The most relevant dashboards for this change are:
- https://dashboards.gitlab.net/d/stage-groups-pipeline_execution/stage-groups-group-dashboard-verify-pipeline-execution?orgId=1&from=now-1h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-controller=All&var-action=All&var-runner_type=All&viewPanel=37 - the namespaced workers should go up and the other ones down
- https://dashboards.gitlab.net/d/stage-groups-pipeline_execution/stage-groups-group-dashboard-verify-pipeline-execution?orgId=1&from=now-1h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-controller=All&var-action=All&var-runner_type=All&viewPanel=38 - there should not be any significant change in the error rates
- https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview?orgId=1&from=now-1h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&viewPanel=296 - check the queue sizes for the workers. The old queues must get to/be zero and the new ones are not getting too large.
"pipeline_processing:ci_build_finished"
"pipeline_processing:build_finished"
"pipeline_background:ci_archive_trace"
"pipeline_background:archive_trace"
Rollout Steps
Rollout on non-production environments
-
Ensure that the feature MRs have been deployed to non-production environments. -
/chatops run auto_deploy status 12bd2c8f64f3dc95fb6c6f62e4c122f066f473dc
-
-
Enable the feature globally on non-production environments. -
/chatops run feature set ci_build_finished_worker_namespace_changed true --dev
-
/chatops run feature set ci_build_finished_worker_namespace_changed true --staging
-
-
Verify that the feature works as expected. Posting the QA result in this issue is preferable.
Preparation before production rollout
-
Ensure that the feature MRs have been deployed to both production and canary. -
/chatops run auto_deploy status 12bd2c8f64f3dc95fb6c6f62e4c122f066f473dc
-
-
Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production. If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the @sre-oncall
Slack alias. -
Announce on the feature issue an estimated time this will be enabled on GitLab.com. -
If the feature flag in code has an actor, enable it on GitLab.com for testing groups/projects. -
/chatops run feature set --project=gitlab-org/gitlab ci_build_finished_worker_namespace_changed true
-
/chatops run feature set --project=allison.browne/ci-hello-world ci_build_finished_worker_namespace_changed true
-
-
Verify that the feature works as expected. Posting the QA result in this issue is preferable.
Global rollout on production
-
Incrementally roll out the feature. - If the feature flag in code has an actor, perform actor-based rollout.
-
/chatops run feature set ci_build_finished_worker_namespace_changed true
-
- If the feature flag in code has an actor, perform actor-based rollout.
-
Announce on the feature issue that the feature has been globally enabled. -
Wait for at least one day for the verification term.
Release the feature
After the feature has been deemed stable, the clean up should be done as soon as possible to permanently enable the feature and reduce complexity in the codebase.
-
Create a merge request to remove ci_build_finished_worker_namespace_changed
feature flag. Ask for review and merge it.-
Remove all references to the feature flag from the codebase. -
Remove the YAML definitions for the feature from the repository. -
Create a changelog entry. -
Migrate the queues as described in the documentation for removing a worker: https://docs.gitlab.com/ee/development/sidekiq_style_guide.html#removing-workers
-
-
Ensure that the cleanup MR has been deployed to both production and canary. If the merge request was deployed before the code cutoff, the feature can be officially announced in a release blog post. -
/chatops run auto_deploy status <merge-commit-of-cleanup-mr>
-
-
Close the feature issue to indicate the feature will be released in the current milestone. -
Clean up the feature flag from all environments by running these chatops command in #production
channel:-
/chatops run feature delete ci_build_finished_worker_namespace_changed --dev
-
/chatops run feature delete ci_build_finished_worker_namespace_changed --staging
-
/chatops run feature delete ci_build_finished_worker_namespace_changed
-
-
Close this rollout issue.
Rollback Steps
-
This feature can be disabled by running the following Chatops command:
/chatops run feature set ci_build_finished_worker_namespace_changed false
Edited by Allison Browne