Skip to content

Draft: Add partial index to ci_pending_builds table

Max Orefice requested to merge mo-add-partial-index-to-ci-pending-builds into master

Ref: #335293 (closed)

What does this MR do?

This MR creates a new partial index which will help use scaling our new ci queuing build system.

Why are we doing this?

Right now, we don't have a lot of records in our table (~10K/day).

In order to anticipate our future growth and avoid any database issues we should set up an index which will gets used as we continue to grow.

Database review

1️⃣ New index

DB migration
$ bin/rails db:migrate
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: migrating
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- execute("CREATE INDEX CONCURRENTLY index_ci_pending_builds_on_runners_and_ci_minutes_partial\nON ci_pending_builds (project_id) INCLUDE (id, build_id)\nWHERE instance_runners_enabled = TRUE AND minutes_exceeded = FALSE AND tag_ids = '{}'\n")
   -> 0.1590s
-- execute("RESET statement_timeout")
   -> 0.0006s
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: migrated (0.1609s)
DB rollback
$ bin/rails db:rollback
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: reverting
-- transaction_open?()
   -> 0.0000s
-- indexes(:ci_pending_builds)
   -> 0.0043s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:ci_pending_builds, {:algorithm=>:concurrently, :name=>"index_ci_pending_builds_on_runners_and_ci_minutes_partial"})
   -> 0.0043s
-- execute("RESET statement_timeout")
   -> 0.0004s
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: reverted (0.0132s)

Query impacted

After further investigations it seems that we always check for ci_minutes available when shared_runners is enabled so I think having a partial index could make sense here.

Query plan before: 45.706 ms
WITH project_builds AS MATERIALIZED (
    SELECT
        ci_running_builds.project_id,
        COUNT(
            *
) AS running_builds
    FROM
        ci_running_builds
    WHERE
        ci_running_builds.runner_type = 1
    GROUP BY
        ci_running_builds.project_id
)
SELECT
-   ci_pending_builds.*
FROM
    ci_pending_builds
    LEFT JOIN project_builds ON ci_pending_builds.project_id = project_builds.project_id
WHERE
    ci_pending_builds.instance_runners_enabled = TRUE
    AND ci_pending_builds.minutes_exceeded = FALSE
-   AND (tag_ids <@ ARRAY[NULL]::int[])
ORDER BY
    COALESCE(project_builds.running_builds, 0) ASC,
    ci_pending_builds.build_id ASC
 Sort  (cost=6097.57..6106.04 rows=3388 width=73) (actual time=43.834..43.839 rows=4 loops=1)
   Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_pending_builds.build_id
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=10296 read=1216 dirtied=503
   I/O Timings: read=29.546 write=0.000
   CTE project_builds
     ->  Aggregate  (cost=0.29..1114.44 rows=2990 width=16) (actual time=0.331..8.440 rows=913 loops=1)
           Group Key: ci_running_builds.project_id
           Buffers: shared hit=1893 read=205 dirtied=107
           I/O Timings: read=5.291 write=0.000
           ->  Index Scan using index_ci_running_builds_on_project_id on public.ci_running_builds  (cost=0.29..1034.93 rows=9922 width=8) (actual time=0.191..7.567 rows=3050 loops=1)
                 Filter: (ci_running_builds.runner_type = 1)
                 Rows Removed by Filter: 0
                 Buffers: shared hit=1893 read=205 dirtied=107
                 I/O Timings: read=5.291 write=0.000
   ->  Merge Left Join  (cost=232.83..4784.49 rows=3388 width=73) (actual time=26.028..43.813 rows=4 loops=1)
         Merge Cond: (ci_pending_builds.project_id = project_builds.project_id)
         Buffers: shared hit=10296 read=1216 dirtied=503
         I/O Timings: read=29.546 write=0.000
         ->  Index Scan using index_ci_pending_builds_on_project_id on public.ci_pending_builds  (cost=0.41..4516.92 rows=3388 width=65) (actual time=16.902..34.523 rows=4 loops=1)
               Filter: (ci_pending_builds.instance_runners_enabled AND (NOT ci_pending_builds.minutes_exceeded) AND (ci_pending_builds.tag_ids <@ '{NULL}'::integer[]))
               Rows Removed by Filter: 11411
               Buffers: shared hit=8403 read=1011 dirtied=396
               I/O Timings: read=24.254 write=0.000
         ->  Sort  (cost=232.41..239.89 rows=2990 width=16) (actual time=9.096..9.172 rows=911 loops=1)
               Sort Key: project_builds.project_id
               Sort Method: quicksort  Memory: 67kB
               Buffers: shared hit=1893 read=205 dirtied=107
               I/O Timings: read=5.291 write=0.000
               ->  CTE Scan on project_builds  (cost=0.00..59.80 rows=2990 width=16) (actual time=0.335..8.879 rows=913 loops=1)
                     Buffers: shared hit=1893 read=205 dirtied=107
                     I/O Timings: read=5.291 write=0.000
Query plan after: 2.330 ms
WITH project_builds AS MATERIALIZED
(
    SELECT
  ci_running_builds.project_id,
  COUNT(
            *
) AS running_builds
FROM
  ci_running_builds
WHERE
        ci_running_builds.runner_type = 1
GROUP BY
        ci_running_builds.project_id
)
SELECT
+ ci_pending_builds.id,
+ ci_pending_builds.build_id
FROM
  ci_pending_builds
  LEFT JOIN project_builds ON ci_pending_builds.project_id = project_builds.project_id
WHERE
    ci_pending_builds.instance_runners_enabled = TRUE
  AND ci_pending_builds.minutes_exceeded = FALSE
+ AND (tag_ids = '{}')
ORDER BY
    COALESCE(project_builds.running_builds, 0) ASC,
    ci_pending_builds.build_id ASC;
 Sort  (cost=773.41..775.97 rows=1021 width=24) (actual time=0.025..0.027 rows=0 loops=1)
   Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_pending_builds.build_id
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=1
   I/O Timings: read=0.000 write=0.000
   CTE project_builds
     ->  HashAggregate  (cost=453.69..462.52 rows=883 width=16) (actual time=0.000..0.000 rows=0 loops=0)
           Group Key: ci_running_builds.project_id
           I/O Timings: read=0.000 write=0.000
           ->  Seq Scan on public.ci_running_builds  (cost=0.00..436.06 rows=3525 width=8) (actual time=0.000..0.000 rows=0 loops=0)
                 Filter: (ci_running_builds.runner_type = 1)
                 Rows Removed by Filter: 0
                 I/O Timings: read=0.000 write=0.000
   ->  Merge Left Join  (cost=60.99..259.87 rows=1021 width=24) (actual time=0.016..0.016 rows=0 loops=1)
         Merge Cond: (ci_pending_builds.project_id = project_builds.project_id)
         Buffers: shared hit=1
         I/O Timings: read=0.000 write=0.000
         ->  Index Only Scan using index_ci_pending_builds_on_runners_and_ci_minutes_partial on public.ci_pending_builds  (cost=0.12..188.66 rows=1021 width=24) (actual time=0.014..0.015 rows=0 loops=1)
               Heap Fetches: 0
               Buffers: shared hit=1
               I/O Timings: read=0.000 write=0.000
         ->  Sort  (cost=60.87..63.07 rows=883 width=16) (actual time=0.000..0.000 rows=0 loops=0)
               Sort Key: project_builds.project_id
               I/O Timings: read=0.000 write=0.000
               ->  CTE Scan on project_builds  (cost=0.00..17.66 rows=883 width=16) (actual time=0.000..0.000 rows=0 loops=0)
                     I/O Timings: read=0.000 write=0.000

Here the steps I followed to try optimizing this query:

  1. Disabled sequential scanning on our thin-clone: exec SET enable_seqscan = off;
  2. Created new partial index
  3. Executed query to see if our index was used

Open to suggestions if we could accelerate this query even more 🙏

Does this MR meet the acceptance criteria?

Conformity

Edited by Max Orefice

Merge request reports