Raise Tag Count Limit for Ongoing Phase 2 container Registry Migration: Paid Tier
Production Change
Change Summary
This is part of the work to upgrade and migrate the GitLab.com container registry to a new version backed by a metadata database and online garbage collection (gitlab-org&5523 (closed)). We are now working on Phase 2 (migrating existing repositories), and gitlab-org/gitlab#364566 (closed).
The migration is driven by Rails. One of the supporting application settings, named container_registry_import_max_tags_count, controls the maximum number of tags that a container registry might have to be migrated (or otherwise skipped for a later retry). This defaults to 100, and we are now ready to raise this limit to migrate larger repositories progressively.
This change request is to update the container_registry_import_max_tags_count application setting in multiple iterations until no limit remains.
This approach is the same as for the free tier: #7022 (closed)
Change Details
- Services Impacted - ServiceContainer Registry
- Change Technician - @skarbek, @jennykim-gitlab
- Change Reviewer - @hswimelar
- Time tracking - 10 minutes each iteration
- Downtime Component - 0
Detailed steps for the change
Change Steps - steps to take to execute the change
Estimated Time to Complete (mins) - 10 minutes
To update the required application setting(s), open a Rails console with write permissions and run the following command:
-
Set label changein-progress /label ~change::in-progress -
Set the container_registry_import_max_tags_countto the new value. (We'll need to do this a couple of times, each change will be done in a comment)
::Gitlab::CurrentSettings.current_application_settings.update!(container_registry_import_max_tags_count: <value>)
-
Set value to 10000 - #7343 (comment 1009294085) -
Set value to 200000 - #7343 (comment 1018335189) -
Set label changecomplete /label ~change::complete
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
Estimated Time to Complete (mins) - 5 minutes
-
Restore the container_registry_import_max_tags_countto the default value:::Gitlab::CurrentSettings.current_application_settings.update!(container_registry_import_max_tags_count: 10000)
-
Set label changeaborted /label ~change::aborted
Monitoring
Key metrics to observe
- Metric: Import p90 latency
- Location: https://dashboards.gitlab.net/d/registry-migration/registry-migration-detail?orgId=1&from=now-1h&to=now&viewPanel=127 Hide charts
- What changes to this metric should prompt a rollback: We expect to see the import p90 latency increase after raising the tag limit. At the current tag limit (
6000), pre-imports are currently peaking at ~6h and final imports at ~2m. A severe unproportional increase should be analyzed to consider a rollback.
- Metric: Failed import rate
- Location: https://dashboards.gitlab.net/d/registry-migration/registry-migration-detail?orgId=1&from=now-1h&to=now&viewPanel=125 Hide charts
- What changes to this metric should prompt a rollback: A significant increase in the import rate should trigger the rollback of the tag limit increase.
Change Reviewer checklist
-
Check if the following applies: - The scheduled day and time of execution of the change is appropriate.
- The change plan is technically accurate.
- The change plan includes estimated timing values based on previous testing.
- The change plan includes a viable rollback plan.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
-
Check if the following applies: - The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
- The change plan includes success measures for all steps/milestones during the execution.
- The change adequately minimizes risk within the environment/service.
- The performance implications of executing the change are well-understood and documented.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
- If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
- The change has a primary and secondary SRE with knowledge of the details available during the change window.
- The labels blocks deployments and/or blocks feature-flags are applied as necessary
Change Technician checklist
-
Check if all items below are complete: - The change plan is technically accurate.
- This Change Issue is linked to the appropriate Issue and/or Epic
- Change has been tested in staging and results noted in a comment on this issue.
- A dry-run has been conducted and results noted in a comment on this issue.
- For C1 and C2 change issues, the change event is added to the GitLab Production calendar.
- For C1 and C2 change issues, the SRE on-call has been informed prior to change being rolled out. (In #production channel, mention
@sre-oncalland this issue and await their acknowledgement.) - Release managers have been informed (If needed! Cases include DB change) prior to change being rolled out. (In #production channel, mention
@release-managersand this issue and await their acknowledgment.) - There are currently no active incidents that are severity1 or severity2
- If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change.