Add cronjob to clean up stale runner machines

What does this MR do and why?

Describe in detail what your merge request does and why.

This MR adds a cronjob that cleans up stale ci_runner_machines records (where contacted_at happened more than 7 days ago).

Part of #387398 (closed)

Related MRs

Blueprint: https://docs.gitlab.com/ee/architecture/blueprints/runner_tokens/#reusing-the-runner-authentication-token-across-many-machines

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Database query plan

SQL
INSERT INTO ci_runner_machines (runner_id, machine_xid, created_at, updated_at)
SELECT 1000 AS runner_id,
  uuid_in(md5(random()::text || random()::text)::cstring) AS machine_xid, NOW() AS created_at, NOW() AS updated_at
FROM generate_series(1, 3000);

INSERT INTO ci_runner_machines (runner_id, machine_xid, created_at, updated_at)
SELECT 1000 AS runner_id,
  uuid_in(md5(random()::text || random()::text)::cstring) AS machine_xid, '2022-01-01' AS created_at, NOW() AS updated_at
FROM generate_series(1, 3000);

DELETE FROM "ci_runner_machines" WHERE "ci_runner_machines"."id" IN (SELECT "ci_runner_machines"."id" FROM "ci_runner_machines" WHERE "ci_runner_machines"."id" IN (SELECT "ci_runner_machines"."id" FROM ((SELECT "ci_runner_machines".* FROM "ci_runner_machines" WHERE "ci_runner_machines"."contacted_at" IS NULL)
UNION ALL
(SELECT "ci_runner_machines".* FROM "ci_runner_machines" WHERE "ci_runner_machines"."contacted_at" <= '2023-01-17 10:06:04.652717')) ci_runner_machines WHERE "ci_runner_machines"."created_at" <= '2023-01-17 10:06:04.651305') LIMIT 1000)
Query plan

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14766/commands/51599

 ModifyTable on public.ci_runner_machines  (cost=441.01..666.81 rows=262 width=38) (actual time=11.168..11.174 rows=0 loops=1)
   Buffers: shared hit=7130
   I/O Timings: read=0.000 write=0.000
   ->  Nested Loop  (cost=441.01..666.81 rows=262 width=38) (actual time=8.303..10.350 rows=1000 loops=1)
         Buffers: shared hit=6130
         I/O Timings: read=0.000 write=0.000
         ->  HashAggregate  (cost=440.73..442.73 rows=200 width=40) (actual time=8.295..8.490 rows=1000 loops=1)
               Group Key: "ANY_subquery".id
               Buffers: shared hit=3130
               I/O Timings: read=0.000 write=0.000
               ->  Subquery Scan on ANY_subquery  (cost=211.65..440.07 rows=262 width=40) (actual time=5.323..7.780 rows=1000 loops=1)
                     Buffers: shared hit=3130
                     I/O Timings: read=0.000 write=0.000
                     ->  Limit  (cost=211.65..437.45 rows=262 width=8) (actual time=5.297..7.505 rows=1000 loops=1)
                           Buffers: shared hit=3130
                           I/O Timings: read=0.000 write=0.000
                           ->  Nested Loop  (cost=211.65..437.45 rows=262 width=8) (actual time=5.295..7.415 rows=1000 loops=1)
                                 Buffers: shared hit=3130
                                 I/O Timings: read=0.000 write=0.000
                                 ->  HashAggregate  (cost=211.37..213.37 rows=200 width=8) (actual time=5.254..5.476 rows=1000 loops=1)
                                       Group Key: "*SELECT* 1".id
                                       Buffers: shared hit=130
                                       I/O Timings: read=0.000 write=0.000
                                       ->  Append  (cost=0.28..210.71 rows=262 width=8) (actual time=0.074..3.920 rows=3000 loops=1)
                                             Buffers: shared hit=130
                                             I/O Timings: read=0.000 write=0.000
                                             ->  Subquery Scan on *SELECT* 1  (cost=0.28..20.06 rows=4 width=8) (actual time=0.073..3.637 rows=3000 loops=1)
                                                   Buffers: shared hit=128
                                                   I/O Timings: read=0.000 write=0.000
                                                   ->  Index Scan using index_ci_runner_machines_on_contacted_at_desc_and_id_desc on public.ci_runner_machines ci_runner_machines_2  (cost=0.28..20.02 rows=4 width=266) (actual time=0.073..3.310 rows=3000 loops=1)
                                                         Index Cond: (ci_runner_machines_2.contacted_at IS NULL)
                                                         Filter: (ci_runner_machines_2.created_at <= '2023-01-17 10:06:04.651305+00'::timestamp with time zone)
                                                         Rows Removed by Filter: 3000
                                                         Buffers: shared hit=128
                                                         I/O Timings: read=0.000 write=0.000
                                             ->  Subquery Scan on *SELECT* 2  (cost=0.28..189.34 rows=258 width=8) (actual time=0.009..0.010 rows=0 loops=1)
                                                   Buffers: shared hit=2
                                                   I/O Timings: read=0.000 write=0.000
                                                   ->  Index Scan using index_ci_runner_machines_on_contacted_at_desc_and_id_desc on public.ci_runner_machines ci_runner_machines_3  (cost=0.28..186.76 rows=258 width=266) (actual time=0.008..0.008 rows=0 loops=1)
                                                         Index Cond: (ci_runner_machines_3.contacted_at <= '2023-01-17 10:06:04.652717+00'::timestamp with time zone)
                                                         Filter: (ci_runner_machines_3.created_at <= '2023-01-17 10:06:04.651305+00'::timestamp with time zone)
                                                         Rows Removed by Filter: 0
                                                         Buffers: shared hit=2
                                                         I/O Timings: read=0.000 write=0.000
                                 ->  Index Only Scan using ci_runner_machines_pkey on public.ci_runner_machines ci_runner_machines_1  (cost=0.28..1.21 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                       Index Cond: (ci_runner_machines_1.id = "*SELECT* 1".id)
                                       Heap Fetches: 1000
                                       Buffers: shared hit=3000
                                       I/O Timings: read=0.000 write=0.000
         ->  Index Scan using ci_runner_machines_pkey on public.ci_runner_machines  (cost=0.28..1.21 rows=1 width=14) (actual time=0.001..0.001 rows=1 loops=1000)
               Index Cond: (ci_runner_machines.id = "ANY_subquery".id)
               Buffers: shared hit=3000
               I/O Timings: read=0.000 write=0.000

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Pedro Pombeiro

Merge request reports

Loading