Draft: Add partial index to ci_pending_builds table
Ref: #335293 (closed)
What does this MR do?
This MR creates a new partial index which will help use scaling our new ci queuing build system.
Why are we doing this?
Right now, we don't have a lot of records in our table (~10K/day).
In order to anticipate our future growth and avoid any database issues we should set up an index which will gets used as we continue to grow.
Database review
1️⃣ New index
DB migration
$ bin/rails db:migrate
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: migrating
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- execute("CREATE INDEX CONCURRENTLY index_ci_pending_builds_on_runners_and_ci_minutes_partial\nON ci_pending_builds (project_id) INCLUDE (id, build_id)\nWHERE instance_runners_enabled = TRUE AND minutes_exceeded = FALSE AND tag_ids = '{}'\n")
-> 0.1590s
-- execute("RESET statement_timeout")
-> 0.0006s
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: migrated (0.1609s)
DB rollback
$ bin/rails db:rollback
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: reverting
-- transaction_open?()
-> 0.0000s
-- indexes(:ci_pending_builds)
-> 0.0043s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:ci_pending_builds, {:algorithm=>:concurrently, :name=>"index_ci_pending_builds_on_runners_and_ci_minutes_partial"})
-> 0.0043s
-- execute("RESET statement_timeout")
-> 0.0004s
== 20210831192049 AddInstanceRunnersAndCiMinutesPartialIndexToCiPendingBuild: reverted (0.0132s)
Query impacted
After further investigations it seems that we always check for ci_minutes
available when shared_runners
is enabled so I think having a partial index could make sense here.
Query plan before: 45.706 ms
WITH project_builds AS MATERIALIZED (
SELECT
ci_running_builds.project_id,
COUNT(
*
) AS running_builds
FROM
ci_running_builds
WHERE
ci_running_builds.runner_type = 1
GROUP BY
ci_running_builds.project_id
)
SELECT
- ci_pending_builds.*
FROM
ci_pending_builds
LEFT JOIN project_builds ON ci_pending_builds.project_id = project_builds.project_id
WHERE
ci_pending_builds.instance_runners_enabled = TRUE
AND ci_pending_builds.minutes_exceeded = FALSE
- AND (tag_ids <@ ARRAY[NULL]::int[])
ORDER BY
COALESCE(project_builds.running_builds, 0) ASC,
ci_pending_builds.build_id ASC
Sort (cost=6097.57..6106.04 rows=3388 width=73) (actual time=43.834..43.839 rows=4 loops=1)
Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_pending_builds.build_id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=10296 read=1216 dirtied=503
I/O Timings: read=29.546 write=0.000
CTE project_builds
-> Aggregate (cost=0.29..1114.44 rows=2990 width=16) (actual time=0.331..8.440 rows=913 loops=1)
Group Key: ci_running_builds.project_id
Buffers: shared hit=1893 read=205 dirtied=107
I/O Timings: read=5.291 write=0.000
-> Index Scan using index_ci_running_builds_on_project_id on public.ci_running_builds (cost=0.29..1034.93 rows=9922 width=8) (actual time=0.191..7.567 rows=3050 loops=1)
Filter: (ci_running_builds.runner_type = 1)
Rows Removed by Filter: 0
Buffers: shared hit=1893 read=205 dirtied=107
I/O Timings: read=5.291 write=0.000
-> Merge Left Join (cost=232.83..4784.49 rows=3388 width=73) (actual time=26.028..43.813 rows=4 loops=1)
Merge Cond: (ci_pending_builds.project_id = project_builds.project_id)
Buffers: shared hit=10296 read=1216 dirtied=503
I/O Timings: read=29.546 write=0.000
-> Index Scan using index_ci_pending_builds_on_project_id on public.ci_pending_builds (cost=0.41..4516.92 rows=3388 width=65) (actual time=16.902..34.523 rows=4 loops=1)
Filter: (ci_pending_builds.instance_runners_enabled AND (NOT ci_pending_builds.minutes_exceeded) AND (ci_pending_builds.tag_ids <@ '{NULL}'::integer[]))
Rows Removed by Filter: 11411
Buffers: shared hit=8403 read=1011 dirtied=396
I/O Timings: read=24.254 write=0.000
-> Sort (cost=232.41..239.89 rows=2990 width=16) (actual time=9.096..9.172 rows=911 loops=1)
Sort Key: project_builds.project_id
Sort Method: quicksort Memory: 67kB
Buffers: shared hit=1893 read=205 dirtied=107
I/O Timings: read=5.291 write=0.000
-> CTE Scan on project_builds (cost=0.00..59.80 rows=2990 width=16) (actual time=0.335..8.879 rows=913 loops=1)
Buffers: shared hit=1893 read=205 dirtied=107
I/O Timings: read=5.291 write=0.000
Query plan after: 2.330 ms
WITH project_builds AS MATERIALIZED
(
SELECT
ci_running_builds.project_id,
COUNT(
*
) AS running_builds
FROM
ci_running_builds
WHERE
ci_running_builds.runner_type = 1
GROUP BY
ci_running_builds.project_id
)
SELECT
+ ci_pending_builds.id,
+ ci_pending_builds.build_id
FROM
ci_pending_builds
LEFT JOIN project_builds ON ci_pending_builds.project_id = project_builds.project_id
WHERE
ci_pending_builds.instance_runners_enabled = TRUE
AND ci_pending_builds.minutes_exceeded = FALSE
+ AND (tag_ids = '{}')
ORDER BY
COALESCE(project_builds.running_builds, 0) ASC,
ci_pending_builds.build_id ASC;
Sort (cost=773.41..775.97 rows=1021 width=24) (actual time=0.025..0.027 rows=0 loops=1)
Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_pending_builds.build_id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
CTE project_builds
-> HashAggregate (cost=453.69..462.52 rows=883 width=16) (actual time=0.000..0.000 rows=0 loops=0)
Group Key: ci_running_builds.project_id
I/O Timings: read=0.000 write=0.000
-> Seq Scan on public.ci_running_builds (cost=0.00..436.06 rows=3525 width=8) (actual time=0.000..0.000 rows=0 loops=0)
Filter: (ci_running_builds.runner_type = 1)
Rows Removed by Filter: 0
I/O Timings: read=0.000 write=0.000
-> Merge Left Join (cost=60.99..259.87 rows=1021 width=24) (actual time=0.016..0.016 rows=0 loops=1)
Merge Cond: (ci_pending_builds.project_id = project_builds.project_id)
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_ci_pending_builds_on_runners_and_ci_minutes_partial on public.ci_pending_builds (cost=0.12..188.66 rows=1021 width=24) (actual time=0.014..0.015 rows=0 loops=1)
Heap Fetches: 0
Buffers: shared hit=1
I/O Timings: read=0.000 write=0.000
-> Sort (cost=60.87..63.07 rows=883 width=16) (actual time=0.000..0.000 rows=0 loops=0)
Sort Key: project_builds.project_id
I/O Timings: read=0.000 write=0.000
-> CTE Scan on project_builds (cost=0.00..17.66 rows=883 width=16) (actual time=0.000..0.000 rows=0 loops=0)
I/O Timings: read=0.000 write=0.000
Here the steps I followed to try optimizing this query:
- Disabled sequential scanning on our thin-clone:
exec SET enable_seqscan = off;
- Created new partial index
- Executed query to see if our index was used
Open to suggestions if we could accelerate this query even more
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) - [-] I have added/updated documentation, or it's not needed. (Is documentation required?)
- [-] I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
-
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Edited by Max Orefice