[Feature flag] Rollout of `ci_delete_objects`
What
Related issues: #220422 (closed), https://gitlab.com/gitlab-org/gitlab/-/issues/223034, https://gitlab.com/gitlab-org/gitlab/-/issues/233939
Related merge requests: !42095 (merged), !42237 (merged), !39464 (merged), !43100 (merged), !42242 (merged)
We have changed how Ci::DestroyExpiredJobArtifactsService
works. Instead of removing the artifacts one by one, it now copies the information needed to identify the object storage files associated with them into a new table called ci_deleted_objects
in batches of 100
, deletes the the records from ci_job_artifacts
, and updates the project_statistics
for them.
Ci::DeleteObjectsWorker
will do the actual removal from object storage. This worker is configured to run concurrently, with a default max_running_jobs
of 0
, meaning that it will not execute any jobs unless the concurrency feature flags are on. Changing the concurrency setting to a higher value will be visible only after the execution of Ci::ScheduleDeleteObjectsCronWorker
which should happen every 16 minutes
. Changing it to a lower setting should reduce the number of running jobs instantly.
Feature flags:
-
ci_delete_objects
- Turning this FF on will change how we remove expired job artifacts. Should see bulk inserts intoci_deleted_objects
and mass deletes. -
ci_delete_objects_low_concurrency
- turning this FF on setsmax_running_jobs
to2
-
ci_delete_objects_medium_concurrency
-max_running_jobs
will be20
ifci_delete_objects_low_concurrency
is off -
ci_delete_objects_high_concurrency
-max_running_jobs
will be50
ifci_delete_objects_low_concurrency
andci_delete_objects_medium_concurrency
are off
Future work
Because of #281688 (closed) we didn't get to check ci_delete_objects_medium_concurrency
and ci_delete_objects_high_concurrency
. Their clean up is going to be tracked in #287632 (closed).
Owners
- Team: ~"group::continuous integration"
- Most appropriate slack channel to reach out to:
#g_ci
- Best individual to reach out to: @mbobin
Expectations
What are we expecting to happen?
- The number of expired job artifacts should go down
- Storage quota for projects should go down
What might happen if this goes wrong?
- operations on
ci_job_artifacts
are atomic and we should not persist anything intoci_deleted_objects
without removing it fromci_job_artifacts
, so reverting the feature flag should be safe.
What can we monitor to detect problems with this?
Thanos queries as explained at #247103 (comment 435056335)
Roll Out Steps
-
Enable on staging -
Test on staging -
Ensure that documentation has been updated -
Coordinate a time to enable the flag with #production
and#g_delivery
on slack. -
Announce on the issue an estimated time this will be enabled on GitLab.com -
Enable on GitLab.com by running chatops command in #production
-
Cross post chatops slack command to #support_gitlab-com
(more guidance when this is necessary in the dev docs) and in your team channel -
Announce on the issue that the flag has been enabled -
Remove feature flag and add changelog entry -
After the flag removal is deployed, clean up the feature flag by running chatops command in #production
channel