Skip to content

Optimize the query for selecting a build in Resource Group

Problem

Resource group is a feature to limit the concurrency of the jobs. Currently, when the system picks a job for assigning a resource from resource group, it runs the following query, but they are slow, so that it could affect the reliability on GitLab CI/CD.

Rails:

resource_group.processables.waiting_for_resource.take(free_resources)

SQL:

SELECT
    "ci_builds".*
FROM
    "ci_builds"
WHERE
    "ci_builds"."type" IN ('Ci::Processable', 'Ci::Bridge', 'Ci::Build')
    AND "ci_builds"."resource_group_id" = 110209
    AND "ci_builds"."status" = 'waiting_for_resource'
LIMIT 1;

Plan:

 Limit  (cost=175.05..176.56 rows=1 width=1288) (actual time=699.922..699.926 rows=0 loops=1)
   Buffers: shared hit=116 read=586
   I/O Timings: read=682.295 write=0.000
   ->  Bitmap Heap Scan on public.ci_builds  (cost=175.05..176.56 rows=1 width=1288) (actual time=699.919..699.922 rows=0 loops=1)
         Filter: ((ci_builds.type)::text = ANY ('{Ci::Processable,Ci::Bridge,Ci::Build}'::text[]))
         Rows Removed by Filter: 0
         Buffers: shared hit=116 read=586
         I/O Timings: read=682.295 write=0.000
         ->  BitmapAnd  (cost=175.05..175.05 rows=1 width=0) (actual time=699.837..699.839 rows=0 loops=1)
               Buffers: shared hit=116 read=586
               I/O Timings: read=682.295 write=0.000
               ->  Bitmap Index Scan using index_ci_builds_on_status_and_type_and_runner_id  (cost=0.00..78.27 rows=4942 width=0) (actual time=691.724..691.724 rows=11938 loops=1)
                     Index Cond: ((ci_builds.status)::text = 'waiting_for_resource'::text)
                     Buffers: shared read=586
                     I/O Timings: read=682.295 write=0.000
               ->  Bitmap Index Scan using index_for_resource_group  (cost=0.00..96.53 rows=6995 width=0) (actual time=7.277..7.277 rows=26342 loops=1)
                     Index Cond: (ci_builds.resource_group_id = 110209)
                     Buffers: shared hit=116
                     I/O Timings: read=0.000 write=0.000

Timing with cold cache:

Time: 700.942 ms
  - planning: 0.816 ms
  - execution: 700.126 ms
    - I/O read: 682.295 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 116 (~928.00 KiB) from the buffer pool
  - reads: 586 (~4.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

We're also adding process modes in Resource Group. A different process mode executes a slightly different query, like the following:

SELECT
    "ci_builds".*
FROM
    "ci_builds"
WHERE
    "ci_builds"."type" IN ('Ci::Processable', 'Ci::Bridge', 'Ci::Build')
    AND "ci_builds"."resource_group_id" = 110209
    AND (
        "ci_builds"."status" IN ('created', 'scheduled', 'waiting_for_resource')
    )
ORDER BY
    commit_id ASC,
    CASE status WHEN 'waiting_for_resource' THEN 0 ELSE 1 END ASC
LIMIT
    1;

Plan

 Limit  (cost=10480.13..10480.13 rows=1 width=1292) (actual time=71893.793..71893.818 rows=0 loops=1)
   Buffers: shared hit=124 read=26321 dirtied=4
   I/O Timings: read=71462.472 write=0.000
   ->  Sort  (cost=10480.13..10480.43 rows=122 width=1292) (actual time=71893.782..71893.797 rows=0 loops=1)
         Sort Key: ci_builds.commit_id, (CASE ci_builds.status WHEN 'waiting_for_resource'::text THEN 0 ELSE 1 END)
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=124 read=26321 dirtied=4
         I/O Timings: read=71462.472 write=0.000
         ->  Index Scan using index_for_resource_group on public.ci_builds  (cost=0.56..10479.52 rows=122 width=1292) (actual time=71893.725..71893.726 rows=0 loops=1)
               Index Cond: (ci_builds.resource_group_id = 110209)
               Filter: (((ci_builds.type)::text = ANY ('{Ci::Processable,Ci::Bridge,Ci::Build}'::text[])) AND ((ci_builds.status)::text = ANY ('{created,scheduled,waiting_for_resource}'::text[])))
               Rows Removed by Filter: 26342
               Buffers: shared hit=121 read=26321 dirtied=4
               I/O Timings: read=71462.472 write=0.000

Timing with cold cache:

Time: 1.199 min
  - planning: 35.168 ms
  - execution: 1.198 min
    - I/O read: 1.191 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 124 (~992.00 KiB) from the buffer pool
  - reads: 26321 (~205.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 4 (~32.00 KiB)
  - writes: 0

We should optimize these queries to make the feature functional.

Proposal

  • Add an index exec index_for_resource_group_new ON ci_builds USING btree (resource_group_id, status, commit_id) WHERE (resource_group_id IS NOT NULL);
    • Given that we're adding an index to ci_builds, which takes a long time, we need to use an async operation with prepare_async_index.
  • Remove the previous inefficient index index_for_resource_group ON ci_builds USING btree (resource_group_id, id) WHERE (resource_group_id IS NOT NULL);

We have already confirmed that the new index improves the existing/new queries as the following:

The existing query

SELECT
"ci_builds".*
FROM
"ci_builds"
WHERE
"ci_builds"."resource_group_id" = 110209 AND "ci_builds"."status" = 'waiting_for_resource'
LIMIT 1;
 Limit  (cost=0.56..3.58 rows=1 width=1288) (actual time=7.203..7.205 rows=0 loops=1)
   Buffers: shared hit=3 read=4
   I/O Timings: read=7.056 write=0.000
   ->  Index Scan using index_for_resource_group_new on public.ci_builds  (cost=0.56..3.58 rows=1 width=1288) (actual time=7.201..7.202 rows=0 loops=1)
         Index Cond: ((ci_builds.resource_group_id = 110209) AND ((ci_builds.status)::text = 'waiting_for_resource'::text))
         Buffers: shared hit=3 read=4
         I/O Timings: read=7.056 write=0.000

When process_mode is oldest_first

SELECT
"ci_builds".*
FROM
"ci_builds"
WHERE
"ci_builds"."resource_group_id" = 110209 AND ("ci_builds"."status" IN ('created', 'scheduled', 'waiting_for_resource'))
ORDER BY
commit_id ASC,
CASE status WHEN 'waiting_for_resource' THEN 0 ELSE 1 END ASC
LIMIT 1;
 Limit  (cost=194.42..194.42 rows=1 width=1292) (actual time=2.572..2.574 rows=0 loops=1)
   Buffers: shared hit=18 read=3
   I/O Timings: read=2.292 write=0.000
   ->  Sort  (cost=194.42..194.73 rows=126 width=1292) (actual time=2.569..2.571 rows=0 loops=1)
         Sort Key: ci_builds.commit_id, (CASE ci_builds.status WHEN 'waiting_for_resource'::text THEN 0 ELSE 1 END)
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=18 read=3
         I/O Timings: read=2.292 write=0.000
         ->  Index Scan using index_for_resource_group_new on public.ci_builds  (cost=0.56..193.79 rows=126 width=1292) (actual time=2.547..2.548 rows=0 loops=1)
               Index Cond: ((ci_builds.resource_group_id = 110209) AND ((ci_builds.status)::text = ANY ('{created,scheduled,waiting_for_resource}'::text[])))
               Buffers: shared hit=15 read=3
               I/O Timings: read=2.292 write=0.000
Edited by Shinya Maeda