Optimize the query for selecting a build in Resource Group
Problem
Resource group is a feature to limit the concurrency of the jobs. Currently, when the system picks a job for assigning a resource from resource group, it runs the following query, but they are slow, so that it could affect the reliability on GitLab CI/CD.
Rails:
resource_group.processables.waiting_for_resource.take(free_resources)
SQL:
SELECT
"ci_builds".*
FROM
"ci_builds"
WHERE
"ci_builds"."type" IN ('Ci::Processable', 'Ci::Bridge', 'Ci::Build')
AND "ci_builds"."resource_group_id" = 110209
AND "ci_builds"."status" = 'waiting_for_resource'
LIMIT 1;
Plan:
Limit (cost=175.05..176.56 rows=1 width=1288) (actual time=699.922..699.926 rows=0 loops=1)
Buffers: shared hit=116 read=586
I/O Timings: read=682.295 write=0.000
-> Bitmap Heap Scan on public.ci_builds (cost=175.05..176.56 rows=1 width=1288) (actual time=699.919..699.922 rows=0 loops=1)
Filter: ((ci_builds.type)::text = ANY ('{Ci::Processable,Ci::Bridge,Ci::Build}'::text[]))
Rows Removed by Filter: 0
Buffers: shared hit=116 read=586
I/O Timings: read=682.295 write=0.000
-> BitmapAnd (cost=175.05..175.05 rows=1 width=0) (actual time=699.837..699.839 rows=0 loops=1)
Buffers: shared hit=116 read=586
I/O Timings: read=682.295 write=0.000
-> Bitmap Index Scan using index_ci_builds_on_status_and_type_and_runner_id (cost=0.00..78.27 rows=4942 width=0) (actual time=691.724..691.724 rows=11938 loops=1)
Index Cond: ((ci_builds.status)::text = 'waiting_for_resource'::text)
Buffers: shared read=586
I/O Timings: read=682.295 write=0.000
-> Bitmap Index Scan using index_for_resource_group (cost=0.00..96.53 rows=6995 width=0) (actual time=7.277..7.277 rows=26342 loops=1)
Index Cond: (ci_builds.resource_group_id = 110209)
Buffers: shared hit=116
I/O Timings: read=0.000 write=0.000
Timing with cold cache:
Time: 700.942 ms
- planning: 0.816 ms
- execution: 700.126 ms
- I/O read: 682.295 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 116 (~928.00 KiB) from the buffer pool
- reads: 586 (~4.60 MiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
We're also adding process modes in Resource Group. A different process mode executes a slightly different query, like the following:
SELECT
"ci_builds".*
FROM
"ci_builds"
WHERE
"ci_builds"."type" IN ('Ci::Processable', 'Ci::Bridge', 'Ci::Build')
AND "ci_builds"."resource_group_id" = 110209
AND (
"ci_builds"."status" IN ('created', 'scheduled', 'waiting_for_resource')
)
ORDER BY
commit_id ASC,
CASE status WHEN 'waiting_for_resource' THEN 0 ELSE 1 END ASC
LIMIT
1;
Plan
Limit (cost=10480.13..10480.13 rows=1 width=1292) (actual time=71893.793..71893.818 rows=0 loops=1)
Buffers: shared hit=124 read=26321 dirtied=4
I/O Timings: read=71462.472 write=0.000
-> Sort (cost=10480.13..10480.43 rows=122 width=1292) (actual time=71893.782..71893.797 rows=0 loops=1)
Sort Key: ci_builds.commit_id, (CASE ci_builds.status WHEN 'waiting_for_resource'::text THEN 0 ELSE 1 END)
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=124 read=26321 dirtied=4
I/O Timings: read=71462.472 write=0.000
-> Index Scan using index_for_resource_group on public.ci_builds (cost=0.56..10479.52 rows=122 width=1292) (actual time=71893.725..71893.726 rows=0 loops=1)
Index Cond: (ci_builds.resource_group_id = 110209)
Filter: (((ci_builds.type)::text = ANY ('{Ci::Processable,Ci::Bridge,Ci::Build}'::text[])) AND ((ci_builds.status)::text = ANY ('{created,scheduled,waiting_for_resource}'::text[])))
Rows Removed by Filter: 26342
Buffers: shared hit=121 read=26321 dirtied=4
I/O Timings: read=71462.472 write=0.000
Timing with cold cache:
Time: 1.199 min
- planning: 35.168 ms
- execution: 1.198 min
- I/O read: 1.191 min
- I/O write: 0.000 ms
Shared buffers:
- hits: 124 (~992.00 KiB) from the buffer pool
- reads: 26321 (~205.60 MiB) from the OS file cache, including disk I/O
- dirtied: 4 (~32.00 KiB)
- writes: 0
We should optimize these queries to make the feature functional.
Proposal
- Add an index
exec index_for_resource_group_new ON ci_builds USING btree (resource_group_id, status, commit_id) WHERE (resource_group_id IS NOT NULL);
- Given that we're adding an index to
ci_builds
, which takes a long time, we need to use an async operation withprepare_async_index
.
- Given that we're adding an index to
- Remove the previous inefficient index
index_for_resource_group ON ci_builds USING btree (resource_group_id, id) WHERE (resource_group_id IS NOT NULL);
We have already confirmed that the new index improves the existing/new queries as the following:
The existing query
SELECT
"ci_builds".*
FROM
"ci_builds"
WHERE
"ci_builds"."resource_group_id" = 110209 AND "ci_builds"."status" = 'waiting_for_resource'
LIMIT 1;
Limit (cost=0.56..3.58 rows=1 width=1288) (actual time=7.203..7.205 rows=0 loops=1)
Buffers: shared hit=3 read=4
I/O Timings: read=7.056 write=0.000
-> Index Scan using index_for_resource_group_new on public.ci_builds (cost=0.56..3.58 rows=1 width=1288) (actual time=7.201..7.202 rows=0 loops=1)
Index Cond: ((ci_builds.resource_group_id = 110209) AND ((ci_builds.status)::text = 'waiting_for_resource'::text))
Buffers: shared hit=3 read=4
I/O Timings: read=7.056 write=0.000
When process_mode
is oldest_first
SELECT
"ci_builds".*
FROM
"ci_builds"
WHERE
"ci_builds"."resource_group_id" = 110209 AND ("ci_builds"."status" IN ('created', 'scheduled', 'waiting_for_resource'))
ORDER BY
commit_id ASC,
CASE status WHEN 'waiting_for_resource' THEN 0 ELSE 1 END ASC
LIMIT 1;
Limit (cost=194.42..194.42 rows=1 width=1292) (actual time=2.572..2.574 rows=0 loops=1)
Buffers: shared hit=18 read=3
I/O Timings: read=2.292 write=0.000
-> Sort (cost=194.42..194.73 rows=126 width=1292) (actual time=2.569..2.571 rows=0 loops=1)
Sort Key: ci_builds.commit_id, (CASE ci_builds.status WHEN 'waiting_for_resource'::text THEN 0 ELSE 1 END)
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=18 read=3
I/O Timings: read=2.292 write=0.000
-> Index Scan using index_for_resource_group_new on public.ci_builds (cost=0.56..193.79 rows=126 width=1292) (actual time=2.547..2.548 rows=0 loops=1)
Index Cond: ((ci_builds.resource_group_id = 110209) AND ((ci_builds.status)::text = ANY ('{created,scheduled,waiting_for_resource}'::text[])))
Buffers: shared hit=15 read=3
I/O Timings: read=2.292 write=0.000
Edited by Shinya Maeda