Skip to content

Add indexes on deployments to improve environments search

Stan Hu requested to merge sh-add-index-for-deployments into master

To load diffs and other files, we make two SELECT calls in MergeRequest#environments_for: one for the source branch, and one for the target branch.

The source branch query makes a subquery:

SELECT environment_id FROM deployments WHERE project_id = 123 AND ref = 'source_branch'

The target branch makes this subquery:

SELECT environment_id FROM deployments WHERE project_id = 123 AND (ref = 'master' OR tag IS TRUE)

Both subqueries were taking over 250 ms to execute. When we add an index for the deployments table on (projects_id, ref), this execution drops down to under 5 ms since the table usually has a small number of rows for the source branch.

Adding a partial index on projects_id WHERE tag IS TRUE allows PostgreSQL to perform two separate indexed scans: one for the ref, and one for the tag condition. However, this query still takes 60 ms to run for the gitlab-org/gitlab project because master has 10,000+ rows. We should consider adding a LIMIT in a future iteration.

Closes #55353 (closed)

Details

Source query

SELECT
  "environments".*
FROM
  "environments"
WHERE
  "environments"."project_id" = 278964
  AND ("environments"."state" IN ('available'))
  AND "environments"."id" IN (
    SELECT
      "deployments"."environment_id"
    FROM
      "deployments"
    WHERE
      "deployments"."project_id" = 278964
      AND (ref = 'sh-optimize-commit-is-ancestor-env')
    GROUP BY
      "deployments"."environment_id"
  )
ORDER BY
  (
    SELECT
      MAX("deployments"."id")
    FROM
      "deployments"
    WHERE
      "deployments"."environment_id" = "environments"."id"
  ) ASC NULLS FIRST

Query plan:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=139519.61..139519.61 rows=1 width=148) (actual time=127.854..127.854 rows=0 loops=1)
   Sort Key: ((SubPlan 2)) NULLS FIRST
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=139515.94..139519.60 rows=1 width=148) (actual time=127.811..127.811 rows=0 loops=1)
         ->  Group  (cost=139515.51..139515.52 rows=1 width=4) (actual time=127.811..127.811 rows=0 loops=1)
               Group Key: deployments.environment_id
               ->  Sort  (cost=139515.51..139515.52 rows=1 width=4) (actual time=127.810..127.810 rows=0 loops=1)
                     Sort Key: deployments.environment_id
                     Sort Method: quicksort  Memory: 25kB
                     ->  Index Scan using index_deployments_on_project_id_and_id on deployments  (cost=0.56..139515.50 rows=1 width=4) (actual time=127.806..127.806 rows=0 loops=1)
                           Index Cond: (project_id = 278964)
                           Filter: ((ref)::text = 'sh-optimize-commit-is-ancestor-env'::text)
                           Rows Removed by Filter: 104862
         ->  Index Scan using environments_pkey on environments  (cost=0.42..3.45 rows=1 width=144) (never executed)
               Index Cond: (id = deployments.environment_id)
               Filter: ((project_id = 278964) AND ((state)::text = 'available'::text))
         SubPlan 2
           ->  Result  (cost=0.60..0.61 rows=1 width=4) (never executed)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.56..0.60 rows=1 width=4) (never executed)
                         ->  Index Only Scan Backward using index_deployments_on_environment_id_and_id on deployments deployments_1  (cost=0.56..12.15 rows=354 width=4) (never executed)
                               Index Cond: ((environment_id = environments.id) AND (id IS NOT NULL))
                               Heap Fetches: 0
 Planning time: 0.496 ms
 Execution time: 127.912 ms
(25 rows)

After

 Sort  (cost=9.75..9.75 rows=1 width=140) (actual time=0.403..0.403 rows=0 loops=1)
   Sort Key: ((SubPlan 2)) NULLS FIRST
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=3 read=4
   I/O Timings: read=0.231
   ->  Nested Loop  (cost=5.02..9.74 rows=1 width=140) (actual time=0.329..0.329 rows=0 loops=1)
         Buffers: shared read=4
         I/O Timings: read=0.231
         ->  Group  (cost=4.59..4.60 rows=1 width=4) (actual time=0.328..0.328 rows=0 loops=1)
               Group Key: deployments.environment_id
               Buffers: shared read=4
               I/O Timings: read=0.231
               ->  Sort  (cost=4.59..4.60 rows=1 width=4) (actual time=0.327..0.327 rows=0 loops=1)
                     Sort Key: deployments.environment_id
                     Sort Method: quicksort  Memory: 25kB
                     Buffers: shared read=4
                     I/O Timings: read=0.231
                     ->  Index Scan using deployments_test_index on public.deployments  (cost=0.56..4.58 rows=1 width=4) (actual time=0.320..0.320 rows=0 loops=1)
                           Index Cond: ((deployments.project_id = 278964) AND ((deployments.ref)::text = 'sh-optimize-commit-is-ancestor-env'::text))
                           Buffers: shared read=4
                           I/O Timings: read=0.231
         ->  Index Scan using environments_pkey on public.environments  (cost=0.42..4.45 rows=1 width=136) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (environments.id = deployments.environment_id)
               Filter: ((environments.project_id = 278964) AND ((environments.state)::text = 'available'::text))
               Rows Removed by Filter: 0
         SubPlan 2
           ->  Result  (cost=0.66..0.67 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.56..0.66 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                         ->  Index Only Scan using index_deployments_on_environment_id_and_id on public.deployments deployments_1  (cost=0.56..33.13 rows=349 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                               Index Cond: ((deployments_1.environment_id = environments.id) AND (deployments_1.id IS NOT NULL))
                               Heap Fetches: 0

Summary:

Time: 3.378 ms
  - planning: 2.879 ms
  - execution: 0.499 ms
    - I/O read: 0.231 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Target query

SELECT
  "environments".*
FROM
  "environments"
WHERE
  "environments"."project_id" = 278964
  AND ("environments"."state" IN ('available'))
  AND "environments"."id" IN (
    SELECT
      "deployments"."environment_id"
    FROM
      "deployments"
    WHERE
      "deployments"."project_id" = 278964
      AND (
        ref = 'master'
        OR tag IS TRUE
      )
    GROUP BY
      "deployments"."environment_id"
  )
ORDER BY
  (
    SELECT
      MAX("deployments"."id")
    FROM
      "deployments"
    WHERE
      "deployments"."environment_id" = "environments"."id"
  ) ASC NULLS FIRST
                                                                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=150679.28..150688.41 rows=3653 width=148) (actual time=165.995..165.995 rows=5 loops=1)
   Sort Key: ((SubPlan 2)) NULLS FIRST
   Sort Method: quicksort  Memory: 25kB
   ->  Hash Join  (cost=140565.26..150463.12 rows=3653 width=148) (actual time=142.417..165.980 rows=5 loops=1)
         Hash Cond: (environments.id = deployments.environment_id)
         ->  Index Scan using index_environments_on_project_id_state_environment_type on environments  (cost=0.42..7648.78 rows=7306 width=144) (actual time=0.070..26.446 rows=10295 loops=1)
               Index Cond: ((project_id = 278964) AND ((state)::text = 'available'::text))
         ->  Hash  (cost=140194.15..140194.15 rows=29655 width=4) (actual time=137.117..137.117 rows=106 loops=1)
               Buckets: 32768  Batches: 1  Memory Usage: 260kB
               ->  HashAggregate  (cost=139601.05..139897.60 rows=29655 width=4) (actual time=136.995..137.071 rows=106 loops=1)
                     Group Key: deployments.environment_id
                     ->  Index Scan using index_deployments_on_project_id_and_id on deployments  (cost=0.56..139515.50 rows=34219 width=4) (actual time=0.033..135.166 rows=10448 loops=1)
                           Index Cond: (project_id = 278964)
                           Filter: (((ref)::text = 'master'::text) OR (tag IS TRUE))
                           Rows Removed by Filter: 94414
         SubPlan 2
           ->  Result  (cost=0.60..0.61 rows=1 width=4) (actual time=0.032..0.032 rows=1 loops=5)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.56..0.60 rows=1 width=4) (actual time=0.030..0.030 rows=1 loops=5)
                         ->  Index Only Scan Backward using index_deployments_on_environment_id_and_id on deployments deployments_1  (cost=0.56..12.15 rows=354 width=4) (actual time=0.028..0.028 rows=1 loops=5)
                               Index Cond: ((environment_id = environments.id) AND (id IS NOT NULL))
                               Heap Fetches: 1
 Planning time: 0.553 ms
 Execution time: 166.281 ms
(24 rows)

After

 Sort  (cost=97786.44..97795.38 rows=3573 width=148) (actual time=60.560..60.561 rows=5 loops=1)
   Sort Key: ((SubPlan 2)) NULLS FIRST
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=17484
   ->  Hash Join  (cost=85261.68..97575.59 rows=3573 width=148) (actual time=35.476..60.514 rows=5 loops=1)
         Hash Cond: (environments.id = deployments.environment_id)
         Buffers: shared hit=17481
         ->  Index Scan using index_environments_on_project_id_state_environment_type on public.environments  (cost=0.42..9935.40 rows=7146 width=144) (actual time=0.033..24.525 rows=10257 loops=1)
               Index Cond: ((environments.project_id = 278964) AND ((environments.state)::text = 'available'::text))
               Buffers: shared hit=7474
         ->  Hash  (cost=84889.06..84889.06 rows=29776 width=4) (actual time=32.444..32.444 rows=106 loops=1)
               Buckets: 32768  Batches: 1  Memory Usage: 260kB
               Buffers: shared hit=9983
               ->  HashAggregate  (cost=84293.54..84591.30 rows=29776 width=4) (actual time=32.297..32.392 rows=106 loops=1)
                     Group Key: deployments.environment_id
                     Buffers: shared hit=9983
                     ->  Bitmap Heap Scan on public.deployments  (cost=687.51..84207.58 rows=34382 width=4) (actual time=3.896..28.439 rows=10457 loops=1)
                           Buffers: shared hit=9983
                           ->  BitmapOr  (cost=687.51..687.51 rows=35284 width=0) (actual time=2.168..2.168 rows=0 loops=1)
                                 Buffers: shared hit=47
                                 ->  Bitmap Index Scan using i1  (cost=0.00..629.34 rows=32277 width=0) (actual time=2.161..2.161 rows=10457 loops=1)
                                       Index Cond: ((deployments.project_id = 278964) AND ((deployments.ref)::text = 'master'::text))
                                       Buffers: shared hit=44
                                 ->  Bitmap Index Scan using i2  (cost=0.00..40.98 rows=3007 width=0) (actual time=0.007..0.007 rows=0 loops=1)
                                       Index Cond: (deployments.project_id = 278964)
                                       Buffers: shared hit=3
         SubPlan 2
           ->  Result  (cost=0.65..0.66 rows=1 width=4) (actual time=0.040..0.040 rows=1 loops=5)
                 Buffers: shared hit=24
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.56..0.65 rows=1 width=4) (actual time=0.037..0.038 rows=1 loops=5)
                         Buffers: shared hit=24
                         ->  Index Only Scan using index_deployments_on_environment_id_and_id on public.deployments deployments_1  (cost=0.56..30.01 rows=356 width=4) (actual time=0.033..0.033 rows=1 loops=5)
                               Index Cond: ((deployments_1.environment_id = environments.id) AND (deployments_1.id IS NOT NULL))
                               Heap Fetches: 1
                               Buffers: shared hit=24
Time: 64.050 ms
  - planning: 3.077 ms
  - execution: 60.973 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 17484 (~136.60 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Edited by Stan Hu

Merge request reports