Nullify duplicate project runners tokens (!81991) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

Describe in detail what your merge request does and why.

This MR adds background migrations to ensure that there are no duplicate runner registration tokens in the projects table. The goal is to later follow up with adding unique indices for two columns in the projects table: runners_token and runners_token_encrypted. While we do have logic in the application to prevent new duplicates, there is a chance that existing tables have duplicates (as seen in the .com database).

This MR performs the following actions for each of the columns:

Detect and nullify any duplicate runner registration tokens. This will cause a new registration token to be generated the next time it is needed. Due to the timing of https://gitlab.com/gitlab-com/gl-security/security-operations/sirt/operations/-/issues/2010 and subsequent decision to rotate all registration tokens, there should be minimal to no additional impact on customers;

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Migration output

These migrations are expected to take about 30 hours to run (based on ~1800 batches given 17.8M records currently in .com and a 1-minute delay between batches), most of which is spent waiting. Without the wait times, it would be less than one hour to run the migrations.

bin/rails db:migrate

== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: migrating
-- transaction_open?()
   -> 0.0000s
-- index_exists?("projects", [:id, :runners_token_encrypted], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token_encrypted", :algorithm=>:concurrently})
   -> 0.0254s
-- execute("SET statement_timeout TO 0")
   -> 0.0007s
-- add_index("projects", [:id, :runners_token_encrypted], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token_encrypted", :algorithm=>:concurrently})
   -> 0.0047s
-- execute("RESET statement_timeout")
   -> 0.0007s
-- Scheduled 1 ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects jobs with a maximum of 10000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2022-03-14 14:43:17 UTC."
-- transaction_open?()
   -> 0.0000s
-- index_exists?("projects", {:name=>"tmp_index_projects_on_id_and_runners_token_encrypted"}, {:algorithm=>:concurrently})
   -> 0.0215s
== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: migrated (0.2702s)

== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: migrating
-- transaction_open?()
   -> 0.0000s
-- index_exists?("projects", [:id, :runners_token], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token", :algorithm=>:concurrently})
   -> 0.0207s
-- add_index("projects", [:id, :runners_token], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token", :algorithm=>:concurrently})
   -> 0.0036s
-- Scheduled 0 ResetDuplicateCiRunnersTokenValuesOnProjects jobs with a maximum of 10000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 0 seconds. Expect all jobs to have completed after 2022-03-14 14:41:17 UTC."
-- transaction_open?()
   -> 0.0000s
-- index_exists?("projects", {:name=>"tmp_index_projects_on_id_and_runners_token"}, {:algorithm=>:concurrently})
   -> 0.0211s
== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: migrated (0.0565s)

bin/rails db:rollback RAILS_ENV=development STEP=2

== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: reverting
== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: reverted (0.0000s)

== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: reverting
== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: reverted (0.0000s)

Database query plans

runners_token_encrypted

Fetch initial record

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32465

SELECT
    "projects"."id"
FROM
    "projects"
WHERE
    "projects"."runners_token_encrypted" IS NOT NULL
ORDER BY
    "projects".id ASC
LIMIT 1

 Limit  (cost=0.56..0.60 rows=1 width=4) (actual time=0.331..0.332 rows=1 loops=1)
   Buffers: shared hit=1 read=4
   I/O Timings: read=0.243 write=0.000
   ->  Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects  (cost=0.56..621254.58 rows=17816628 width=4) (actual time=0.329..0.329 rows=1 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=1 read=4
         I/O Timings: read=0.243 write=0.000

Fetch end of batch

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32466

SELECT
    "projects".id
FROM
    "projects"
WHERE
    "projects".runners_token_encrypted IS NOT NULL
    AND "projects".id >= 1
ORDER BY
    "projects".id ASC
LIMIT 1 OFFSET 10000

 Limit  (cost=374.26..374.29 rows=1 width=4) (actual time=41.750..41.752 rows=1 loops=1)
   Buffers: shared hit=9360 read=562 dirtied=174
   I/O Timings: read=34.669 write=0.000
   ->  Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects  (cost=0.56..665796.15 rows=17816628 width=4) (actual time=0.024..41.118 rows=10001 loops=1)
         Index Cond: (projects.id >= 1)
         Heap Fetches: 464
         Buffers: shared hit=9360 read=562 dirtied=174
         I/O Timings: read=34.669 write=0.000

Fetch records in window

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32468

SELECT
    "projects".id
FROM
    "projects"
WHERE
    "projects".runners_token_encrypted IS NOT NULL
    AND "projects".id >= 1
    AND "projects".id < 116681

 Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects  (cost=0.56..5500.76 rows=51725 width=4) (actual time=0.029..4.774 rows=10000 loops=1)
   Index Cond: ((projects.id >= 1) AND (projects.id < 116681))
   Heap Fetches: 464
   Buffers: shared hit=9921
   I/O Timings: read=0.000 write=0.000

Find the duplicate tokens from the batch

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32473

SELECT
    "projects"."id"
FROM
    "projects"
WHERE
    "projects"."id" IN (<10000 tokens>)
GROUP BY
    "projects"."id"
HAVING (COUNT(*) > 1)

 Aggregate  (cost=0.56..21079.67 rows=3333 width=4) (actual time=134.946..134.948 rows=0 loops=1)
   Group Key: projects.id
   Filter: (count(*) > 1)
   Rows Removed by Filter: 10000
   Buffers: shared hit=49623 read=201 dirtied=7
   I/O Timings: read=101.612 write=0.000
   ->  Index Only Scan using projects_pkey on public.projects  (cost=0.56..20904.67 rows=10000 width=4) (actual time=13.279..129.595 rows=10000 loops=1)
         Index Cond: (projects.id = ANY ('{...}'::integer[]))
         Heap Fetches: 472
         Buffers: shared hit=49623 read=201 dirtied=7
         I/O Timings: read=101.612 write=0.000

The case for the non-encrypted runners_token column is the same, except that the number of rows containing values in .com is much smaller (193688 for runners_token vs 17597284 for runners_token_encrypted).

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Part of #354027 (closed)

Edited Mar 16, 2022 by Pedro Pombeiro

Nullify duplicate project runners tokens