Skip to content

Nullify duplicate runner authentication tokens

What does this MR do and why?

Describe in detail what your merge request does and why.

This MR adds background migrations to ensure that there are no duplicate runner tokens in the ci_runners table. The goal is to later follow up with adding unique indices for two columns in the ci_runners table: token and token_encrypted. While we do have logic in the application to prevent new duplicates, there is a chance that existing tables have duplicates (as seen in the .com database). It is similar to a previous MR that performed a similar action on the projects table (runners_token* columns).

This MR performs the following actions for each of the columns:

  • Detect and nullify any duplicate runner registration tokens. This will cause the affected runners to become unusable. This should not be a problem since the only situation that could have explained this scenario is manual handling of the database rows a long time ago. These runners don't seem to be in use anymore;

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

description screenshot
creating duplicates and running migration
batched_background_migration_jobs table after migration

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. On the GDK Postgres console connected to the ci database (gdk psql -d gitlabhq_development_ci), perform the following commands:

    SELECT id, token FROM "ci_runners" ORDER BY id LIMIT 10;
    INSERT INTO ci_runners (runner_type, token) VALUES (1, 'duplicate token');
    INSERT INTO ci_runners (runner_type, token) VALUES (1, 'duplicate token');
  2. Now that we have a couple of records with duplicate tokens, let's run the migration from this MR:

    bundle exec rails db:migrate

    in log/application.log you should be able to confirm that the migrations went through:

    17027:2022-09-23T11:37:58.405Z: {:message=>"BatchedJob transition", :batched_job_id=>12, :previous_state=>:pending, :new_state=>:running, :batched_migration_id=>12, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil}
    17028:2022-09-23T11:37:58.520Z: {:message=>"BatchedJob transition", :batched_job_id=>12, :previous_state=>:running, :new_state=>:succeeded, :batched_migration_id=>12, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil}
    17280:2022-09-23T11:42:03.052Z: {:message=>"BatchedJob transition", :batched_job_id=>14, :previous_state=>:pending, :new_state=>:running, :batched_migration_id=>14, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil}
    17281:2022-09-23T11:42:03.166Z: {:message=>"BatchedJob transition", :batched_job_id=>14, :previous_state=>:running, :new_state=>:succeeded, :batched_migration_id=>14, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil}
  3. Once the migration is complete, the new runners should have their tokens nullified. To verify:

    SELECT id, token FROM "ci_runners" ORDER BY id LIMIT 10;
    SELECT * FROM batched_background_migration_jobs;

Migration output

These migrations are expected to take about 30 hours to run (based on ~1800 batches given 17.8M records currently in .com and a 1-minute delay between batches), most of which is spent waiting. Without the wait times, it would be less than one hour to run the migrations.

bin/rails db:migrate
main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrating ========
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrated (0.0001s)

main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrating
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrated (0.0000s)

ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrating ========
ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrated (0.0292s)

ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrating
ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrated (0.0136s)
bin/rails db:rollback RAILS_ENV=development STEP=2
main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverting
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverted (0.0001s)

ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverting
ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverted (0.0196s)

main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverting ========
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverted (0.0001s)

ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverting ========
ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverted (0.0228s)

Database query plans

https://gitlab.com/gitlab-org/gitlab/blob/pedropombeiro%2F375140%2F1-nullify-duplicate-tokens/lib/gitlab/background_migration/reset_duplicate_ci_runners_token_values.rb#L17-21

Job link: https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/12487/commands/44123

SELECT "ci_runners"."token"
FROM "ci_runners"
WHERE "ci_runners"."token" IN ( SELECT DISTINCT "ci_runners"."token"
    FROM "ci_runners"
    WHERE "ci_runners"."id" BETWEEN 1 AND 2000
      AND "ci_runners"."id" >= 1
      AND "ci_runners"."id" < 2000)
GROUP BY "ci_runners"."token"
HAVING (COUNT(*) > 1)

Query plan:

 HashAggregate  (cost=2661.16..2670.73 rows=255 width=27) (actual time=556.107..556.109 rows=0 loops=1)
   Group Key: ci_runners.token
   Filter: (count(*) > 1)
   Rows Removed by Filter: 654
   Buffers: shared hit=1741 read=707 dirtied=107
   I/O Timings: read=533.858 write=0.000
   ->  Nested Loop  (cost=1149.89..2657.34 rows=765 width=27) (actual time=121.841..554.178 rows=654 loops=1)
         Buffers: shared hit=1741 read=707 dirtied=107
         I/O Timings: read=533.858 write=0.000
         ->  HashAggregate  (cost=1149.46..1157.11 rows=765 width=27) (actual time=118.307..118.913 rows=654 loops=1)
               Group Key: ci_runners_1.token
               Buffers: shared hit=185 read=142 dirtied=107
               I/O Timings: read=109.985 write=0.000
               ->  Index Scan using ci_runners_pkey on public.ci_runners ci_runners_1  (cost=0.43..1147.55 rows=766 width=27) (actual time=3.695..117.538 rows=654 loops=1)
                     Index Cond: ((ci_runners_1.id >= 1) AND (ci_runners_1.id <= 2000) AND (ci_runners_1.id >= 1) AND (ci_runners_1.id < 2000))
                     Buffers: shared hit=185 read=142 dirtied=107
                     I/O Timings: read=109.985 write=0.000
         ->  Index Only Scan using index_ci_runners_on_token on public.ci_runners  (cost=0.43..1.94 rows=1 width=27) (actual time=0.663..0.664 rows=1 loops=654)
               Index Cond: (ci_runners.token = (ci_runners_1.token)::text)
               Heap Fetches: 158
               Buffers: shared hit=1556 read=565
               I/O Timings: read=423.873 write=0.000
https://gitlab.com/gitlab-org/gitlab/blob/pedropombeiro%2F375140%2F1-nullify-duplicate-tokens/lib/gitlab/background_migration/reset_duplicate_ci_runners_token_encrypted_values.rb#L17-21

Job link: https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/12487/commands/44124

SELECT "ci_runners"."token_encrypted"
FROM "ci_runners"
WHERE "ci_runners"."token_encrypted" IN ( SELECT DISTINCT "ci_runners"."token_encrypted"
    FROM "ci_runners"
    WHERE "ci_runners"."id" BETWEEN 1 AND 2000
      AND "ci_runners"."id" >= 1
      AND "ci_runners"."id" < 2000)
GROUP BY "ci_runners"."token_encrypted"
HAVING (COUNT(*) > 1)

Query plan:

 HashAggregate  (cost=2772.39..2781.97 rows=255 width=56) (actual time=585.707..585.710 rows=0 loops=1)
   Group Key: ci_runners.token_encrypted
   Filter: (count(*) > 1)
   Rows Removed by Filter: 654
   Buffers: shared hit=2411 read=703
   I/O Timings: read=567.887 write=0.000
   ->  Nested Loop  (cost=1150.02..2768.56 rows=766 width=56) (actual time=7.371..583.747 rows=654 loops=1)
         Buffers: shared hit=2411 read=703
         I/O Timings: read=567.887 write=0.000
         ->  HashAggregate  (cost=1149.46..1157.12 rows=766 width=56) (actual time=1.077..1.718 rows=654 loops=1)
               Group Key: ci_runners_1.token_encrypted
               Buffers: shared hit=327
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using ci_runners_pkey on public.ci_runners ci_runners_1  (cost=0.43..1147.55 rows=766 width=56) (actual time=0.041..0.754 rows=654 loops=1)
                     Index Cond: ((ci_runners_1.id >= 1) AND (ci_runners_1.id <= 2000) AND (ci_runners_1.id >= 1) AND (ci_runners_1.id < 2000))
                     Buffers: shared hit=327
                     I/O Timings: read=0.000 write=0.000
         ->  Index Only Scan using index_ci_runners_on_token_encrypted on public.ci_runners  (cost=0.55..2.08 rows=1 width=56) (actual time=0.884..0.888 rows=1 loops=654)
               Index Cond: (ci_runners.token_encrypted = (ci_runners_1.token_encrypted)::text)
               Heap Fetches: 158
               Buffers: shared hit=2084 read=703
               I/O Timings: read=567.887 write=0.000

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Part of Replace runner authentication token indices wit... (#375140 - closed) https://gitlab.com/gitlab-org/security/gitlab/-/issues/616+

Merge request reports