Skip to content

Add query index for ci_pipeline_schedules

Tianwen Chen requested to merge 338609-index-for-pipeline-schedule-worker into master

What does this MR do and why?

This is the second MR to add the query index index_ci_pipeline_schedules_on_id_and_next_run_at_and_active for ci_pipeline_schedules that's prepared in !106661 (diffs)

Queries and their performance

Before

  • SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000

    The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49856

    The explain is:

     Limit  (cost=0.42..171.29 rows=1000 width=8) (actual time=0.317..30.370 rows=1000 loops=1)
       Buffers: shared hit=4859 read=22 dirtied=36
       I/O Timings: read=1.154 write=0.000
       ->  Index Scan using ci_pipeline_schedules_pkey on public.ci_pipeline_schedules  (cost=0.42..14468.22 rows=84670 width=8) (actual time=0.316..30.249 rows=1000 loops=1)
             Filter: (ci_pipeline_schedules.active AND (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone))
             Rows Removed by Filter: 4302
             Buffers: shared hit=4859 read=22 dirtied=36
             I/O Timings: read=1.154 write=0.000
  • SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') AND "ci_pipeline_schedules"."id" > 10000 ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000

    The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49857

    The explain is:

     Limit  (cost=0.42..178.99 rows=1000 width=8) (actual time=0.040..3.928 rows=1000 loops=1)
       Buffers: shared hit=2911
       I/O Timings: read=0.000 write=0.000
       ->  Index Scan using ci_pipeline_schedules_pkey on public.ci_pipeline_schedules  (cost=0.42..14723.73 rows=82451 width=8) (actual time=0.039..3.837 rows=1000 loops=1)
             Index Cond: (ci_pipeline_schedules.id > 10000)
             Filter: (ci_pipeline_schedules.active AND (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone))
             Rows Removed by Filter: 1950
             Buffers: shared hit=2911
             I/O Timings: read=0.000 write=0.000

After

If the index is added at postgres.ai as follow:

EXEC CREATE INDEX index_ci_pipeline_schedules_on_id_and_next_run_at_and_active ON ci_pipeline_schedules USING btree (id, next_run_at) WHERE (active = true);

The execution plans for the above queries become:

  • SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000

    The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49860

    The explain is:

     Limit  (cost=0.42..127.70 rows=1000 width=8) (actual time=0.922..3.279 rows=1000 loops=1)
       Buffers: shared hit=935 read=5
       I/O Timings: read=0.956 write=0.000
       ->  Index Scan using index_ci_pipeline_schedules_on_id_and_next_run_at_and_active on public.ci_pipeline_schedules  (cost=0.42..9687.35 rows=76103 width=8) (actual time=0.921..3.181 rows=1000 loops=1)
             Index Cond: (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone)
             Buffers: shared hit=935 read=5
             I/O Timings: read=0.956 write=0.000
  • SELECT "ci_pipeline_schedules"."id", "ci_pipeline_schedules"."owner_id" FROM "ci_pipeline_schedules" WHERE "ci_pipeline_schedules"."active" = TRUE AND (next_run_at < '2023-01-05 02:19:00') AND "ci_pipeline_schedules"."id" > 10000 ORDER BY "ci_pipeline_schedules"."id" ASC LIMIT 1000

    The postgres.ai analyze report is at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14210/commands/49859

    The explain is:

     Limit  (cost=0.42..133.50 rows=1000 width=8) (actual time=0.213..2.369 rows=1000 loops=1)
       Buffers: shared hit=945 read=9
       I/O Timings: read=0.335 write=0.000
       ->  Index Scan using index_ci_pipeline_schedules_on_id_and_next_run_at_and_active on public.ci_pipeline_schedules  (cost=0.42..9862.55 rows=74108 width=8) (actual time=0.212..2.269 rows=1000 loops=1)
             Index Cond: ((ci_pipeline_schedules.id > 10000) AND (ci_pipeline_schedules.next_run_at < '2023-01-05 02:19:00'::timestamp without time zone))
             Buffers: shared hit=945 read=9
             I/O Timings: read=0.335 write=0.000

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #338609 (closed)

Edited by Tianwen Chen

Merge request reports