Add cronjob to clean up stale runner machines
What does this MR do and why?
Describe in detail what your merge request does and why.
This MR adds a cronjob that cleans up stale ci_runner_machines records (where contacted_at happened more than 7 days ago).
Part of #387398 (closed)
Related MRs
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Database query plan
SQL
INSERT INTO ci_runner_machines (runner_id, machine_xid, created_at, updated_at)
SELECT 1000 AS runner_id,
uuid_in(md5(random()::text || random()::text)::cstring) AS machine_xid, NOW() AS created_at, NOW() AS updated_at
FROM generate_series(1, 3000);
INSERT INTO ci_runner_machines (runner_id, machine_xid, created_at, updated_at)
SELECT 1000 AS runner_id,
uuid_in(md5(random()::text || random()::text)::cstring) AS machine_xid, '2022-01-01' AS created_at, NOW() AS updated_at
FROM generate_series(1, 3000);
DELETE FROM "ci_runner_machines" WHERE "ci_runner_machines"."id" IN (SELECT "ci_runner_machines"."id" FROM "ci_runner_machines" WHERE "ci_runner_machines"."id" IN (SELECT "ci_runner_machines"."id" FROM ((SELECT "ci_runner_machines".* FROM "ci_runner_machines" WHERE "ci_runner_machines"."contacted_at" IS NULL)
UNION ALL
(SELECT "ci_runner_machines".* FROM "ci_runner_machines" WHERE "ci_runner_machines"."contacted_at" <= '2023-01-17 10:06:04.652717')) ci_runner_machines WHERE "ci_runner_machines"."created_at" <= '2023-01-17 10:06:04.651305') LIMIT 1000)
Query plan
https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14766/commands/51599
ModifyTable on public.ci_runner_machines (cost=441.01..666.81 rows=262 width=38) (actual time=11.168..11.174 rows=0 loops=1)
Buffers: shared hit=7130
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=441.01..666.81 rows=262 width=38) (actual time=8.303..10.350 rows=1000 loops=1)
Buffers: shared hit=6130
I/O Timings: read=0.000 write=0.000
-> HashAggregate (cost=440.73..442.73 rows=200 width=40) (actual time=8.295..8.490 rows=1000 loops=1)
Group Key: "ANY_subquery".id
Buffers: shared hit=3130
I/O Timings: read=0.000 write=0.000
-> Subquery Scan on ANY_subquery (cost=211.65..440.07 rows=262 width=40) (actual time=5.323..7.780 rows=1000 loops=1)
Buffers: shared hit=3130
I/O Timings: read=0.000 write=0.000
-> Limit (cost=211.65..437.45 rows=262 width=8) (actual time=5.297..7.505 rows=1000 loops=1)
Buffers: shared hit=3130
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=211.65..437.45 rows=262 width=8) (actual time=5.295..7.415 rows=1000 loops=1)
Buffers: shared hit=3130
I/O Timings: read=0.000 write=0.000
-> HashAggregate (cost=211.37..213.37 rows=200 width=8) (actual time=5.254..5.476 rows=1000 loops=1)
Group Key: "*SELECT* 1".id
Buffers: shared hit=130
I/O Timings: read=0.000 write=0.000
-> Append (cost=0.28..210.71 rows=262 width=8) (actual time=0.074..3.920 rows=3000 loops=1)
Buffers: shared hit=130
I/O Timings: read=0.000 write=0.000
-> Subquery Scan on *SELECT* 1 (cost=0.28..20.06 rows=4 width=8) (actual time=0.073..3.637 rows=3000 loops=1)
Buffers: shared hit=128
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_ci_runner_machines_on_contacted_at_desc_and_id_desc on public.ci_runner_machines ci_runner_machines_2 (cost=0.28..20.02 rows=4 width=266) (actual time=0.073..3.310 rows=3000 loops=1)
Index Cond: (ci_runner_machines_2.contacted_at IS NULL)
Filter: (ci_runner_machines_2.created_at <= '2023-01-17 10:06:04.651305+00'::timestamp with time zone)
Rows Removed by Filter: 3000
Buffers: shared hit=128
I/O Timings: read=0.000 write=0.000
-> Subquery Scan on *SELECT* 2 (cost=0.28..189.34 rows=258 width=8) (actual time=0.009..0.010 rows=0 loops=1)
Buffers: shared hit=2
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_ci_runner_machines_on_contacted_at_desc_and_id_desc on public.ci_runner_machines ci_runner_machines_3 (cost=0.28..186.76 rows=258 width=266) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (ci_runner_machines_3.contacted_at <= '2023-01-17 10:06:04.652717+00'::timestamp with time zone)
Filter: (ci_runner_machines_3.created_at <= '2023-01-17 10:06:04.651305+00'::timestamp with time zone)
Rows Removed by Filter: 0
Buffers: shared hit=2
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using ci_runner_machines_pkey on public.ci_runner_machines ci_runner_machines_1 (cost=0.28..1.21 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (ci_runner_machines_1.id = "*SELECT* 1".id)
Heap Fetches: 1000
Buffers: shared hit=3000
I/O Timings: read=0.000 write=0.000
-> Index Scan using ci_runner_machines_pkey on public.ci_runner_machines (cost=0.28..1.21 rows=1 width=14) (actual time=0.001..0.001 rows=1 loops=1000)
Index Cond: (ci_runner_machines.id = "ANY_subquery".id)
Buffers: shared hit=3000
I/O Timings: read=0.000 write=0.000
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Pedro Pombeiro