Skip to content

Draft: Fix statement timeouts in destroying pipeline artifacts

What does this MR do and why?

Previously when Ci::PipelineArtifacts::ExpireArtifactsWorker ran, it would attempt to find all expired artifacts in one go and destroy them in batches. However, the SQL query to find these expired artifacts could do an expensive JOIN due to a nested loop between one large table (ci_pipeline_artifacts) and one very large table (ci_pipelines). In the latter table, PostgreSQL has to filter out locked pipelines, but on GitLab.com it has to iterate through over 3 million rows.

This expensive query would lead to statement timeouts that would prevent the Sidekiq job from doing any work.

To avoid this expensive nested loop, we can reduce the amount of work PostgreSQL has to do by iterating in batches of ci_pipeline_artifacts and only then filtering out locked pipelines.

Relates to #372500 (closed)

Previous query plan

gitlabhq_production=# explain (analyze, buffers) SELECT "ci_pipeline_artifacts".* FROM "ci_pipeline_artifacts" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_pipeline_artifacts"."pipeline_id" WHERE "ci_pipelines"."locked" = 0 AND "ci_pipeline_artifacts"."expire_at" < '2022-09-03 14:08:04.682420' LIMIT 100;
 Limit  (cost=1.00..540.07 rows=100 width=169) (actual time=301019.448..304370.825 rows=1 loops=1)
   Buffers: shared hit=3921598 read=1262175
   I/O Timings: read=295778.275
   ->  Nested Loop  (cost=1.00..2973129.73 rows=551532 width=169) (actual time=301019.447..304370.823 rows=1 loops=1)
         Buffers: shared hit=3921598 read=1262175
         I/O Timings: read=295778.275
         ->  Index Scan using index_ci_pipeline_artifacts_on_expire_at on ci_pipeline_artifacts  (cost=0.43..148004.78 rows=861831 width=169) (actual time=2.472..7273.815 rows=861384 loops=1)
               Index Cond: (expire_at < '2022-09-03 14:08:04.68242+00'::timestamp with time zone)
               Buffers: shared hit=798784 read=77959
               I/O Timings: read=5648.962
         ->  Index Scan using ci_pipelines_pkey on ci_pipelines  (cost=0.57..3.28 rows=1 width=4) (actual time=0.344..0.344 rows=0 loops=861384)
               Index Cond: (id = ci_pipeline_artifacts.pipeline_id)
               Filter: (locked = 0)
               Rows Removed by Filter: 1
               Buffers: shared hit=3122814 read=1184216
               I/O Timings: read=290129.313
 Planning Time: 3.915 ms
 Execution Time: 304370.874 ms

New query plan

D, [2022-09-06T18:23:04.953408 #248188] DEBUG -- :   Ci::PipelineArtifact Load (776.5ms)  /*application:console,db_config_name:ci_replica*/ SELECT "ci_pipeline_artifacts".* FROM "ci_pipeline_artifacts" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_pipeline_artifacts"."pipeline_id" WHERE "ci_pipeline_artifacts"."expire_at" < '2022-09-06 18:23:02.375269' AND "ci_pipeline_artifacts"."id" >= 1286033 AND "ci_pipeline_artifacts"."id" < 1299042 AND "ci_pipelines"."locked" = 0
D, [2022-09-06T18:23:04.956880 #248188] DEBUG -- :   Ci::PipelineArtifact Load (1.9ms)  /*application:console,db_config_name:ci_replica*/ SELECT "ci_pipeline_artifacts"."id" FROM "ci_pipeline_artifacts" WHERE "ci_pipeline_artifacts"."expire_at" < '2022-09-06 18:23:02.375269' AND "ci_pipeline_artifacts"."id" >= 1299042 ORDER BY "ci_pipeline_artifacts"."id" ASC LIMIT 1 OFFSET 1000
gitlabhq_production=# explain (analyze, verbose) SELECT "ci_pipeline_artifacts".* FROM "ci_pipeline_artifacts" INNER JOIN "ci_pipelines" ON "ci_pipelines"."id" = "ci_pipeline_artifacts"."pipeline_id" WHERE "ci_pipeline_artifacts"."expire_at" < '2022-09-06 18:23:02.375269' AND "ci_pipeline_artifacts"."id" >= 1286033 AND "ci_pipeline_artifacts"."id" < 1299042 AND "ci_pipelines"."locked" = 0;
                                                                                                                                                                                                                                                                                                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.00..4247.07 rows=631 width=169) (actual time=1619.575..1619.576 rows=0 loops=1)
   Output: ci_pipeline_artifacts.id, ci_pipeline_artifacts.created_at, ci_pipeline_artifacts.updated_at, ci_pipeline_artifacts.pipeline_id, ci_pipeline_artifacts.project_id, ci_pipeline_artifacts.size, ci_pipeline_artifacts.file_store, ci_pipeline_artifacts.file_type, ci_pipeline_artifacts.file_format, ci_pipeline_artifacts.file, ci_pipeline_artifacts.expire_a
t, ci_pipeline_artifacts.verification_started_at, ci_pipeline_artifacts.verification_retry_at, ci_pipeline_artifacts.verified_at, ci_pipeline_artifacts.verification_state, ci_pipeline_artifacts.verification_retry_count, ci_pipeline_artifacts.verification_checksum, ci_pipeline_artifacts.verification_failure
   Inner Unique: true
   ->  Index Scan using ci_pipeline_artifacts_pkey on public.ci_pipeline_artifacts  (cost=0.43..756.33 rows=971 width=169) (actual time=16.828..207.054 rows=1039 loops=1)
         Output: ci_pipeline_artifacts.id, ci_pipeline_artifacts.created_at, ci_pipeline_artifacts.updated_at, ci_pipeline_artifacts.pipeline_id, ci_pipeline_artifacts.project_id, ci_pipeline_artifacts.size, ci_pipeline_artifacts.file_store, ci_pipeline_artifacts.file_type, ci_pipeline_artifacts.file_format, ci_pipeline_artifacts.file, ci_pipeline_artifacts.ex
pire_at, ci_pipeline_artifacts.verification_started_at, ci_pipeline_artifacts.verification_retry_at, ci_pipeline_artifacts.verified_at, ci_pipeline_artifacts.verification_state, ci_pipeline_artifacts.verification_retry_count, ci_pipeline_artifacts.verification_checksum, ci_pipeline_artifacts.verification_failure
         Index Cond: ((ci_pipeline_artifacts.id >= 1286033) AND (ci_pipeline_artifacts.id < 1299042))
         Filter: (ci_pipeline_artifacts.expire_at < '2022-09-06 18:23:02.375269+00'::timestamp with time zone)
   ->  Index Scan using ci_pipelines_pkey on public.ci_pipelines  (cost=0.57..3.60 rows=1 width=4) (actual time=1.357..1.357 rows=0 loops=1039)
         Output: ci_pipelines.id
         Index Cond: (ci_pipelines.id = ci_pipeline_artifacts.pipeline_id)
         Filter: (ci_pipelines.locked = 0)
         Rows Removed by Filter: 1
 Planning Time: 126.475 ms
 Execution Time: 1619.645 ms
(14 rows)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports