Container Registry: Apply post-deployment migrations in v4.19.0

Production Change

Change Summary

Request to manually apply 5 new post-deployment migrations included in the container registry v4.19.0 in pre, gstg, and gprd environments.

These new post-deployment migrations are related to Manifest delete FK violation attempt during onl... (gitlab-org/container-registry#1489 - closed). The change was introduced and reviewed/approved by the Database team in gitlab-org/container-registry!2105 (merged).

Please read the Context section in this runbook to understand why manual intervention is currently needed.

Target post-deployment migrations:

20250224143831_post_create_manifests_partitions_subject_id_index_batch_1
20250224144143_post_create_manifests_partitions_subject_id_index_batch_2
20250224144144_post_create_manifests_partitions_subject_id_index_batch_3
20250224144145_post_create_manifests_partitions_subject_id_index_batch_4
20250224144152_post_create_manifests_subject_id_index

These PDMs will create a new index in the manifests table (parent and 64 partitions).

Change Details

Services Impacted - ServiceContainer Registry
Change Technician - @dat.tang.gitlab @siddharthkannan @madelacruz
Change Reviewer - @jennykim-gitlab
Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM) - 2025-04-03 09:00
Time tracking - 35 minutes
Downtime Component - NA

Set Maintenance Mode in GitLab

If your change involves scheduled maintenance, add a step to set and unset maintenance mode per our runbooks. This will make sure SLA calculations adjust for the maintenance period.

Detailed steps for the change

All steps in Change Technician checklist are done.

Repeat for each environment:

pre
- Note all performed steps in a comment: <link-to-comment>
gstg
- Note all performed steps in a comment: <link-to-comment>
gprd
- Note all performed steps in a comment: <link-to-comment>

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 5 minutes for pre/gstg, 30 minutes for gprd

Set label changein-progress /label ~change::in-progress
Proceed as described here.
Set label changecomplete /label ~change::complete if no environments left

Rollback

NA. The post-deployment migrations included in this release introduce a new index. In the worst-case scenario, the creation of the index fails and aborts the execution. The only side effect is that we would have to repeat this change after a fix was released.

Contact @jdrpereira in case the CR fails at any step, to investigate a mitigation plan asap.

Monitoring

Key metrics to observe

Metric: Patroni saturation/stability
- Location: https://dashboards.gitlab.net/goto/1NhtRj2NR?orgId=1
- What changes to this metric should prompt a rollback: If an abnormal CPU usage spike is observed around the execution of this change, please abort the ongoing CLI command.
Metric: Service apdex and error rate
- Location: https://dashboards.gitlab.net/goto/KPbAgChHg?orgId=1
- What changes to this metric should prompt a rollback: If an abnormal spike is observed around the execution of this change, please abort the ongoing CLI command.

Change Reviewer checklist

C4 C3 C2 C1:

Check if the following applies:
- The scheduled day and time of execution of the change is appropriate.
- The change plan is technically accurate.
- The change plan includes estimated timing values based on previous testing.
- The change plan includes a viable rollback plan.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.

C2 C1:

Check if the following applies:
- The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
- The change plan includes success measures for all steps/milestones during the execution.
- The change adequately minimizes risk within the environment/service.
- The performance implications of executing the change are well-understood and documented.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
  - If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
- The change has a primary and secondary SRE with knowledge of the details available during the change window.
- The change window has been agreed with Release Managers in advance of the change. If the change is planned for APAC hours, this issue has an agreed pre-change approval.
- The labels blocks deployments and/or blocks feature-flags are applied as necessary.

Change Technician checklist

Edited Apr 03, 2025 by Siddharth Kannan