Draft: Fix statement timeouts in destroying pipeline artifacts
What does this MR do and why?
Previously when Ci::PipelineArtifacts::ExpireArtifactsWorker
ran, it
would attempt to find all expired artifacts in one go and destroy them
in batches. However, the SQL query to find these expired artifacts
could do an expensive JOIN due to a nested loop between one large
table (ci_pipeline_artifacts
) and one very large table
(ci_pipelines
). In the latter table, PostgreSQL has to filter out
locked pipelines, but on GitLab.com it has to iterate through over 3
million rows.
This expensive query would lead to statement timeouts that would prevent the Sidekiq job from doing any work.
To avoid this expensive nested loop, we can reduce the amount of work
PostgreSQL has to do by iterating in batches of
ci_pipeline_artifacts
and only then filtering out locked pipelines.
Relates to #372500 (closed)
Previous query plan
gitlabhq_production=# explain (analyze, buffers) SELECT "ci_pipeline_artifacts".* FROM "ci_pipeline_artifacts" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_pipeline_artifacts"."pipeline_id" WHERE "ci_pipelines"."locked" = 0 AND "ci_pipeline_artifacts"."expire_at" < '2022-09-03 14:08:04.682420' LIMIT 100;
Limit (cost=1.00..540.07 rows=100 width=169) (actual time=301019.448..304370.825 rows=1 loops=1)
Buffers: shared hit=3921598 read=1262175
I/O Timings: read=295778.275
-> Nested Loop (cost=1.00..2973129.73 rows=551532 width=169) (actual time=301019.447..304370.823 rows=1 loops=1)
Buffers: shared hit=3921598 read=1262175
I/O Timings: read=295778.275
-> Index Scan using index_ci_pipeline_artifacts_on_expire_at on ci_pipeline_artifacts (cost=0.43..148004.78 rows=861831 width=169) (actual time=2.472..7273.815 rows=861384 loops=1)
Index Cond: (expire_at < '2022-09-03 14:08:04.68242+00'::timestamp with time zone)
Buffers: shared hit=798784 read=77959
I/O Timings: read=5648.962
-> Index Scan using ci_pipelines_pkey on ci_pipelines (cost=0.57..3.28 rows=1 width=4) (actual time=0.344..0.344 rows=0 loops=861384)
Index Cond: (id = ci_pipeline_artifacts.pipeline_id)
Filter: (locked = 0)
Rows Removed by Filter: 1
Buffers: shared hit=3122814 read=1184216
I/O Timings: read=290129.313
Planning Time: 3.915 ms
Execution Time: 304370.874 ms
New query plan
D, [2022-09-06T18:23:04.953408 #248188] DEBUG -- : Ci::PipelineArtifact Load (776.5ms) /*application:console,db_config_name:ci_replica*/ SELECT "ci_pipeline_artifacts".* FROM "ci_pipeline_artifacts" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_pipeline_artifacts"."pipeline_id" WHERE "ci_pipeline_artifacts"."expire_at" < '2022-09-06 18:23:02.375269' AND "ci_pipeline_artifacts"."id" >= 1286033 AND "ci_pipeline_artifacts"."id" < 1299042 AND "ci_pipelines"."locked" = 0
D, [2022-09-06T18:23:04.956880 #248188] DEBUG -- : Ci::PipelineArtifact Load (1.9ms) /*application:console,db_config_name:ci_replica*/ SELECT "ci_pipeline_artifacts"."id" FROM "ci_pipeline_artifacts" WHERE "ci_pipeline_artifacts"."expire_at" < '2022-09-06 18:23:02.375269' AND "ci_pipeline_artifacts"."id" >= 1299042 ORDER BY "ci_pipeline_artifacts"."id" ASC LIMIT 1 OFFSET 1000
gitlabhq_production=# explain (analyze, verbose) SELECT "ci_pipeline_artifacts".* FROM "ci_pipeline_artifacts" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_pipeline_artifacts"."pipeline_id" WHERE "ci_pipeline_artifacts"."expire_at" < '2022-09-06 18:23:02.375269' AND "ci_pipeline_artifacts"."id" >= 1286033 AND "ci_pipeline_artifacts"."id" < 1299042 AND "ci_pipelines"."locked" = 0;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.00..4247.07 rows=631 width=169) (actual time=1619.575..1619.576 rows=0 loops=1)
Output: ci_pipeline_artifacts.id, ci_pipeline_artifacts.created_at, ci_pipeline_artifacts.updated_at, ci_pipeline_artifacts.pipeline_id, ci_pipeline_artifacts.project_id, ci_pipeline_artifacts.size, ci_pipeline_artifacts.file_store, ci_pipeline_artifacts.file_type, ci_pipeline_artifacts.file_format, ci_pipeline_artifacts.file, ci_pipeline_artifacts.expire_a
t, ci_pipeline_artifacts.verification_started_at, ci_pipeline_artifacts.verification_retry_at, ci_pipeline_artifacts.verified_at, ci_pipeline_artifacts.verification_state, ci_pipeline_artifacts.verification_retry_count, ci_pipeline_artifacts.verification_checksum, ci_pipeline_artifacts.verification_failure
Inner Unique: true
-> Index Scan using ci_pipeline_artifacts_pkey on public.ci_pipeline_artifacts (cost=0.43..756.33 rows=971 width=169) (actual time=16.828..207.054 rows=1039 loops=1)
Output: ci_pipeline_artifacts.id, ci_pipeline_artifacts.created_at, ci_pipeline_artifacts.updated_at, ci_pipeline_artifacts.pipeline_id, ci_pipeline_artifacts.project_id, ci_pipeline_artifacts.size, ci_pipeline_artifacts.file_store, ci_pipeline_artifacts.file_type, ci_pipeline_artifacts.file_format, ci_pipeline_artifacts.file, ci_pipeline_artifacts.ex
pire_at, ci_pipeline_artifacts.verification_started_at, ci_pipeline_artifacts.verification_retry_at, ci_pipeline_artifacts.verified_at, ci_pipeline_artifacts.verification_state, ci_pipeline_artifacts.verification_retry_count, ci_pipeline_artifacts.verification_checksum, ci_pipeline_artifacts.verification_failure
Index Cond: ((ci_pipeline_artifacts.id >= 1286033) AND (ci_pipeline_artifacts.id < 1299042))
Filter: (ci_pipeline_artifacts.expire_at < '2022-09-06 18:23:02.375269+00'::timestamp with time zone)
-> Index Scan using ci_pipelines_pkey on public.ci_pipelines (cost=0.57..3.60 rows=1 width=4) (actual time=1.357..1.357 rows=0 loops=1039)
Output: ci_pipelines.id
Index Cond: (ci_pipelines.id = ci_pipeline_artifacts.pipeline_id)
Filter: (ci_pipelines.locked = 0)
Rows Removed by Filter: 1
Planning Time: 126.475 ms
Execution Time: 1619.645 ms
(14 rows)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.