[FF] `ci_pipeline_archival_setting` -- Roll out pipeline archival behavior on GitLab.com
Summary
This issue is to roll out the feature on production, that is currently behind the ci_pipeline_archival_setting feature flag introduced in !237463 (merged).
The flag gates the existing pipeline archival admin setting (archive_builds_in_seconds) per project so we can gradually enable the behavior on GitLab.com (where it has never been enforced) while keeping self-managed and Dedicated instances unaffected. See https://gitlab.com/gitlab-org/gitlab/-/work_items/547524 for the broader rollout plan.
Owners
- Most appropriate Slack channel to reach out to:
#g_ci-platform - Best individual to reach out to: @mbobin
Expectations
What are we expecting to happen?
- Self-managed and Dedicated instances: no behavior change (flag is
default_enabled: true). - GitLab.com: flag is flipped off before deploy via chatops. We then re-enable it for an internal canary project, then
gitlab-org, then by percentage of projects, monitoring for unexpected impact on retry/cancel/play actions and partition archival. - Once
archive_builds_in_secondsis set on GitLab.com (separate Change Management Request), pipelines older than the configured cutoff become read-only for retry/cancel/play.
What can go wrong and how would we detect it?
- Users unable to retry old pipelines they previously could. Surfaces as a spike in 403s from the pipeline / processable policy paths and user complaints.
Gitlab::Ci::Pipeline::AccessLoggerproduces structured logs witharchived: true/falseper access — we'll use that to monitor. - Background partition archival behavior changing unexpectedly. Use the Sidekiq dashboard for
Ci::Partitions::ArchiveWorker(or equivalent) to confirm no abnormal spike. - Relevant dashboards on https://dashboards.gitlab.net: Verify / Pipeline Execution group dashboards, Sidekiq throughput for CI workers.
Rollout Steps
Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.
Rollout on non-production environments
- Verify the MR with the feature flag is merged to
masterand has been deployed to non-production environments with/chatops gitlab run auto_deploy status <merge-commit-of-your-feature> - Enable the feature globally on non-production environments with
/chatops gitlab run feature set ci_pipeline_archival_setting true --dev --pre --staging --staging-ref - Verify that the feature works as expected on
staging-canary(witharchive_builds_in_secondsset to a small value, retrying an old pipeline is blocked). - If the feature flag causes end-to-end tests to fail, disable the feature flag on staging to avoid blocking deployments.
Before production rollout
- Coordinate with infrastructure to schedule the chatops command to disable the flag on GitLab.com before the deploy that ships this MR lands, so the production state matches the previous behavior on day one.
Specific rollout on production
For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.
- Ensure that the feature MRs have been deployed to both production and canary with
/chatops gitlab run auto_deploy status <merge-commit-of-your-feature> - Before the deploy reaches production, disable globally on GitLab.com:
/chatops gitlab run feature set ci_pipeline_archival_setting false - Re-enable for an internal canary project (project-actor):
/chatops gitlab run feature set --project=gitlab-org/gitlab-test ci_pipeline_archival_setting true - Verify expected behavior on the canary project (retries blocked for pipelines older than the cutoff once an admin sets one).
- Re-enable for
gitlab-organdgitlab-comgroups (group-actor):/chatops gitlab run feature set --group=gitlab-org,gitlab-com ci_pipeline_archival_setting true
Preparation before global rollout
- Set a milestone to this rollout issue to signal for enabling and removing the feature flag when it is stable.
- Check if the feature flag change needs to be accompanied with a change management issue. The Change Management Request to set
archive_builds_in_secondson GitLab.com is the related infra-side ticket — link it here once filed. - Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production.
- Ensure that documentation exists for the feature.
Global rollout on production
For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.
- Incrementally roll out the feature on production using project-actor percentage:
/chatops gitlab run feature set ci_pipeline_archival_setting 1 --actors/chatops gitlab run feature set ci_pipeline_archival_setting 5 --actors/chatops gitlab run feature set ci_pipeline_archival_setting 25 --actors/chatops gitlab run feature set ci_pipeline_archival_setting 50 --actors/chatops gitlab run feature set ci_pipeline_archival_setting 100 --actors- Between every step wait for at least 15 minutes and monitor the appropriate graphs on https://dashboards.gitlab.net.
- After the feature has been 100% enabled, wait for at least one day before releasing the feature.
Release the feature
After the feature has been deemed stable:
- Create a merge request to remove the
ci_pipeline_archival_settingfeature flag.- Remove all references to the feature flag from the codebase.
- Remove the YAML definition.
- Close #600846 (closed) to indicate the feature is released.
- Once the cleanup MR has been deployed to production, clean up the feature flag from all environments:
/chatops gitlab run feature delete ci_pipeline_archival_setting --dev --pre --staging --staging-ref --production - Close this rollout issue.
Rollback Steps
- This feature can be disabled on production by running:
/chatops gitlab run feature set ci_pipeline_archival_setting false- Disable the feature flag on non-production environments:
/chatops gitlab run feature set ci_pipeline_archival_setting false --dev --pre --staging --staging-ref- Delete feature flag from all environments:
/chatops gitlab run feature delete ci_pipeline_archival_setting --dev --pre --staging --staging-ref --production