Skip to content

Reset skipped container repositories

Steve Abrams requested to merge reset-skipped-migrations into master

🔍 What does this MR do and why?

We are in the process of migrating/importing all container repositories to the new container registry. To start, we only imported container repositories that had less than 100 tags. Container repositories with more than 100 were skipped. In gitlab-com/gl-infra/production#7022 (closed) we have increased the tag limit, so we now would like to go back and "reset" the container repositories that were previously skipped so they can qualify for import.

This MR introduces a background migration that will change all container repositories that have a migration state of import_skipped with the reason being too_many_tags (value 2 in an enum) back to a migration state of default and removing the skipped reason.

🐘 Database

I have opted to use the older style of background migration over the new batched background migration because this migration will be much faster with the ability to supply a filtered relation in the initial post-deployment migration. The batched background migrations do not yet easily support this functionality.

Post-deployment migration:

EachBatch for scheduling jobs:

Initial record

select id from container_repositories where migration_state = 'import_skipped' and migration_skipped_reason = 2 order by id asc limit 1;

Explain 63ms: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10023/commands/35497

Max of next batch

select id from container_repositories where migration_state = 'import_skipped' and migration_skipped_reason = 2 and id >= 1 order by id asc limit 1 offset 10000;

Explain 1.3ms: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10023/commands/35498

Batch calculations

There are ~27700 records matching migration_state = 'import_skipped' and migration_skipped_reason = 2.

Batch size of 10_000, 3 jobs will be queued, total migration time of 6 minutes.

Background Migration

Min query for EachBatch

select id from container_repositories where migration_state = 'import_skipped' and migration_skipped_reason = 2 and id >= 1 and id < 24263 order by id asc limit 1;

Explain 171ms: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10023/commands/35515

Max of next batch

select id from container_repositories where migration_state = 'import_skipped' and migration_skipped_reason = 2 and id >= 1 and id < 24263 order by id asc limit 1 offset 100;

We will have 100 iterations, but it is likely only the 1st or very few will be cold-cache queries, so we can expect most iterations to be as fast as the next iteration above.

UPDATE query

UPDATE container_repositories SET migration_state = 'default', migration_skipped_reason = NULL where migration_state = 'import_skipped' and migration_skipped_reason = 2 and id >= 2868 and id < 5939;

Explain 82.3ms: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10023/commands/35509

We will have 100 UPDATES in each job

Job Timing calculations

Each job has 10_000 records

Each sub-batch processes 100 records.

This means there will be 100 update queries.

Total time for EachBatch Queries
100 * 204 = 20400ms => 20.4s # note that most of these queries should be much lower than 204ms

Total time for Updates
100 * 82.3 = 8230ms => 8.23s

Total job time = .083 + 20.4 + 8.23 = ~29s

This timing should give plenty of room for queries that might take longer to still fit within the 2 minute target time for the job.

A note on additional indexes

We could add a btree index on (migration_state, migration_skipped_at, id) to improve the timing on essentially all of the above queries by allowing for an Index Only Scan rather than the various index scans used throughout the existing query plans. For example, here is the query that took 204ms above with such an index 3.7ms: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10023/commands/35519

Given the existing times are within our 1s timing guidelines for background migrations, I don't think it is necessary to add the temporary index for this migration.

📷 Screenshots or screen recordings

N/A

How to set up and validate locally

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related: #359255 (closed)

Edited by Steve Abrams

Merge request reports