Repeat failed migrations with fix
What does this MR do and why?
Describe in detail what your merge request does and why.
The ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects
and ScheduleResetDuplicateCiRunnersTokenValuesOnProjects used a flawed
SQL query that is now corrected. We first clean the records for the
previous migrations in
DeleteFailedResetDuplicateCiRunnersTokenMigrationRecords before
scheduling the new fixed migrations. I didn't bother creating new classes for the fixed migrations because there are no downsides to running the previous migrations (ResetDuplicateCiRunnersTokenValuesOnProjects, ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects) with the fixed code (in a scenario where a self-managed user upgrades from an old version to this new version directly).
Deleting the old pending migration records is also necessary due to #358239 (closed), in which mark_all_as_succeeded was using a typoed name for ResetDuplicateCiRunnersTokenValuesOnProjects.
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Database query plans
runners_token_encrypted
Delete old migration job records
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9332/commands/33428
DELETE FROM "background_migration_jobs"
WHERE "background_migration_jobs"."class_name" = 'ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects' ModifyTable on public.background_migration_jobs  (cost=0.42..2257.23 rows=1792 width=6) (actual time=768.871..768.873 rows=0 loops=1)
   Buffers: shared hit=2460 read=416 dirtied=383
   I/O Timings: read=457.117 write=0.000
   ->  Index Scan using index_background_migration_jobs_on_class_name_and_status_and_id on public.background_migration_jobs  (cost=0.42..2257.23 rows=1792 width=6) (actual time=323.232..758.485 rows=1792 loops=1)
         Index Cond: (background_migration_jobs.class_name = 'ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects'::text)
         Buffers: shared hit=620 read=415 dirtied=71
         I/O Timings: read=456.351 write=0.000Fetch initial record
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9332/commands/33432
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."runners_token_encrypted" IS NOT NULL
ORDER BY "projects"."id" ASC
LIMIT 1 Limit  (cost=0.56..0.61 rows=1 width=4) (actual time=9.392..9.394 rows=1 loops=1)
   Buffers: shared read=6
   I/O Timings: read=9.259 write=0.000
   ->  Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects  (cost=0.56..938886.22 rows=17952230 width=4) (actual time=9.390..9.390 rows=1 loops=1)
         Heap Fetches: 1
         Buffers: shared read=6
         I/O Timings: read=9.259 write=0.000Fetch end of batch
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9332/commands/33433
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."runners_token_encrypted" IS NOT NULL
  AND "projects"."id" >= 1
ORDER BY "projects"."id" ASC
LIMIT 1 OFFSET 10000 Limit  (cost=548.55..548.61 rows=1 width=4) (actual time=2551.440..2551.443 rows=1 loops=1)
   Buffers: shared hit=7456 read=1314 dirtied=561
   I/O Timings: read=2476.901 write=0.000
   ->  Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects  (cost=0.56..983766.75 rows=17952227 width=4) (actual time=1.213..2550.277 rows=10001 loops=1)
         Index Cond: (projects.id >= 6)
         Heap Fetches: 1229
         Buffers: shared hit=7456 read=1314 dirtied=561
         I/O Timings: read=2476.901 write=0.000Fetch records in window
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9332/commands/33435
SELECT DISTINCT "projects"."runners_token_encrypted"
FROM "projects"
WHERE "projects"."runners_token_encrypted" IS NOT NULL
  AND "projects"."id" BETWEEN 1 AND 116672 HashAggregate  (cost=10737.88..11294.80 rows=55692 width=53) (actual time=89.468..92.247 rows=10001 loops=1)
   Group Key: projects.runners_token_encrypted
   Buffers: shared hit=7445 read=1299 dirtied=1
   I/O Timings: read=72.574 write=0.000
   ->  Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects  (cost=0.56..10598.60 rows=55713 width=53) (actual time=0.246..82.523 rows=10001 loops=1)
         Index Cond: ((projects.id >= 1) AND (projects.id <= 116672))
         Heap Fetches: 1208
         Buffers: shared hit=7445 read=1299 dirtied=1
         I/O Timings: read=72.574 write=0.000Find the duplicate tokens from the batch
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9332/commands/33438
SELECT "projects"."runners_token_encrypted"
FROM "projects"
WHERE "projects"."runners_token_encrypted" IS NOT NULL
  AND "projects"."runners_token_encrypted" IN ( SELECT DISTINCT "projects"."runners_token_encrypted"
    FROM "projects"
    WHERE "projects"."runners_token_encrypted" IS NOT NULL
      AND "projects"."id" BETWEEN 1 AND 116672)
GROUP BY "projects"."runners_token_encrypted"
HAVING (COUNT(*) > 1) HashAggregate  (cost=128981.02..129505.48 rows=13986 width=53) (actual time=14936.384..14936.387 rows=0 loops=1)
   Group Key: projects.runners_token_encrypted
   Filter: (count(*) > 1)
   Rows Removed by Filter: 10001
   Buffers: shared hit=44929 read=14034 dirtied=10
   I/O Timings: read=14451.957 write=0.000
   ->  Nested Loop  (cost=10738.45..128771.23 rows=41957 width=53) (actual time=2296.612..14892.214 rows=10001 loops=1)
         Buffers: shared hit=44929 read=14034 dirtied=10
         I/O Timings: read=14451.957 write=0.000
         ->  HashAggregate  (cost=10737.88..11294.80 rows=55692 width=53) (actual time=2287.719..2304.854 rows=10001 loops=1)
               Group Key: projects_1.runners_token_encrypted
               Buffers: shared hit=7445 read=1299
               I/O Timings: read=2238.834 write=0.000
               ->  Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects projects_1  (cost=0.56..10598.60 rows=55713 width=53) (actual time=40.516..2266.591 rows=10001 loops=1)
                     Index Cond: ((projects_1.id >= 1) AND (projects_1.id <= 116672))
                     Heap Fetches: 1208
                     Buffers: shared hit=7445 read=1299
                     I/O Timings: read=2238.834 write=0.000
         ->  Index Only Scan using index_projects_on_runners_token_encrypted on public.projects  (cost=0.56..2.09 rows=1 width=53) (actual time=1.253..1.255 rows=1 loops=10001)
               Index Cond: ((projects.runners_token_encrypted = (projects_1.runners_token_encrypted)::text) AND (projects.runners_token_encrypted IS NOT NULL))
               Heap Fetches: 1230
               Buffers: shared hit=37484 read=12735 dirtied=10
               I/O Timings: read=12213.124 write=0.000The case for the non-encrypted runners_token column is the same, except that the number of rows containing values in .com is much smaller (193688 for runners_token vs 17597284 for runners_token_encrypted).
Migration output
These migrations are expected to take about 12 days to run (based on ~9000 batches given 17.8M records currently in .com and a 1-minute delay between batches), most of which is spent waiting. Without the wait times, it would be less than one hour to run the migrations.
bin/rails db:migrate
== 20220328095848 DeleteFailedResetDuplicateCiRunnersTokenMigrationRecords: migrating
== 20220328095848 DeleteFailedResetDuplicateCiRunnersTokenMigrationRecords: migrated (0.0194s)
== 20220328100456 Schedule20220328ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: migrating
-- Scheduled 1 ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects jobs with a maximum of 10000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2022-03-28 14:53:59 UTC."
== 20220328100456 Schedule20220328ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: migrated (0.1431s)
== 20220328100457 Schedule20220328ResetDuplicateCiRunnersTokenValuesOnProjects: migrating
-- Scheduled 0 ResetDuplicateCiRunnersTokenValuesOnProjects jobs with a maximum of 10000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 0 seconds. Expect all jobs to have completed after 2022-03-28 14:51:59 UTC."
== 20220328100457 Schedule20220328ResetDuplicateCiRunnersTokenValuesOnProjects: migrated (0.0113s)```
</details>
<details>
<summary>
bin/rails db:rollback
</summary>
```ruby
== 20220328100457 Schedule20220328ResetDuplicateCiRunnersTokenValuesOnProjects: reverting
== 20220328100457 Schedule20220328ResetDuplicateCiRunnersTokenValuesOnProjects: reverted (0.0000s)
== 20220328100456 Schedule20220328ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: reverting
== 20220328100456 Schedule20220328ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: reverted (0.0000s)MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
- 
I have evaluated the MR acceptance checklist for this MR. 
Closes #354027 (closed) https://gitlab.com/gitlab-org/security/gitlab/-/issues/616