Skip to content

Add index for sha to deployments

Marc Shaw requested to merge add_index_for_sha_to_deployments into master

What does this MR do?

Adds an index to the sha column for deployments

When looking for environments on projects with a large number of deployments (ie our main gitlab repo, but not other repos), the query is rather slow.

A particular page this effects is the commits show page on the main gitlab repo, ie:

dc267ace

We can see the query takes 700 MS

The place this query is called: https://gitlab.com/gitlab-org/gitlab/-/blob/21d3afd31104179a1b7496a67bb31d605ed9951a/app/finders/environments_finder.rb

The query in question:

SELECT "environments".* FROM "environments" WHERE "environments"."project_id" = 278964 AND ("environments"."state" IN ('available')) AND "environments"."id" IN (SELECT "deployments"."environment_id" FROM "deployments" WHERE "deployments"."project_id" = 278964 AND "deployments"."sha" = 'dc267aceb79eb167c42438be0807d46df6cba56f' GROUP BY "deployments"."environment_id") ORDER BY (SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."environment_id" = "environments"."id") DESC NULLS LAST 

Looking at this query with/without this added index, with cold cache we get:

Without new index

Time: 3.150 min
  - planning: 0.936 ms
  - execution: 3.149 min
    - I/O read: 3.111 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 14266 (~111.50 MiB) from the buffer pool
  - reads: 160831 (~1.20 GiB) from the OS file cache, including disk I/O
  - dirtied: 2991 (~23.40 MiB)
  - writes: 0
 Sort  (cost=272623.03..272623.04 rows=1 width=152) (actual time=188969.308..188969.308 rows=0 loops=1)
   Sort Key: ((SubPlan 2)) DESC NULLS LAST
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=14266 read=160831 dirtied=2991
   I/O Timings: read=186653.389
   ->  Nested Loop  (cost=272619.35..272623.02 rows=1 width=152) (actual time=188969.251..188969.251 rows=0 loops=1)
         Buffers: shared hit=14263 read=160831 dirtied=2991
         I/O Timings: read=186653.389
         ->  Group  (cost=272618.92..272618.93 rows=1 width=4) (actual time=188969.249..188969.249 rows=0 loops=1)
               Group Key: deployments.environment_id
               Buffers: shared hit=14263 read=160831 dirtied=2991
               I/O Timings: read=186653.389
               ->  Sort  (cost=272618.92..272618.93 rows=1 width=4) (actual time=188969.246..188969.247 rows=0 loops=1)
                     Sort Key: deployments.environment_id
                     Sort Method: quicksort  Memory: 25kB
                     Buffers: shared hit=14263 read=160831 dirtied=2991
                     I/O Timings: read=186653.389
                     ->  Index Scan using index_deployments_on_project_id_and_id on public.deployments  (cost=0.57..272618.91 rows=1 width=4) (actual time=188969.233..188969.233 rows=0 loops=1)
                           Index Cond: (deployments.project_id = 278964)
                           Filter: ((deployments.sha)::text = 'dc267aceb79eb167c42438be0807d46df6cba56f'::text)
                           Rows Removed by Filter: 211871
                           Buffers: shared hit=14263 read=160831 dirtied=2991
                           I/O Timings: read=186653.389
         ->  Index Scan using environments_pkey on public.environments  (cost=0.43..3.45 rows=1 width=148) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (environments.id = deployments.environment_id)
               Filter: ((environments.project_id = 278964) AND ((environments.state)::text = 'available'::text))
               Rows Removed by Filter: 0
         SubPlan 2
           ->  Result  (cost=0.62..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.57..0.62 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                         ->  Index Only Scan using index_deployments_on_environment_id_and_id on public.deployments deployments_1  (cost=0.57..33.25 rows=600 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                               Index Cond: ((deployments_1.environment_id = environments.id) AND (deployments_1.id IS NOT NULL))
                               Heap Fetches: 0

With new index

Size of Index

List of relations
 Schema |                Name                 | Type  | Owner  |    Table    |  Size   | Description 
--------+-------------------------------------+-------+--------+-------------+---------+-------------
 public | index_deployments_on_project_id_sha | index | gitlab | deployments | 5965 MB | 
(1 row)

Time taken to create index

The query has been executed. Duration: 9.319 min
Time: 1.696 ms
  - planning: 1.082 ms
  - execution: 0.614 ms
    - I/O read: 0.429 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 0 from the buffer pool
  - reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
 Sort  (cost=7.72..7.72 rows=1 width=152) (actual time=0.531..0.531 rows=0 loops=1)
   Sort Key: ((SubPlan 2)) DESC NULLS LAST
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared read=4
   I/O Timings: read=0.429
   ->  Nested Loop  (cost=4.02..7.71 rows=1 width=152) (actual time=0.516..0.516 rows=0 loops=1)
         Buffers: shared read=4
         I/O Timings: read=0.429
         ->  Group  (cost=3.60..3.60 rows=1 width=4) (actual time=0.516..0.516 rows=0 loops=1)
               Group Key: deployments.environment_id
               Buffers: shared read=4
               I/O Timings: read=0.429
               ->  Sort  (cost=3.60..3.60 rows=1 width=4) (actual time=0.515..0.515 rows=0 loops=1)
                     Sort Key: deployments.environment_id
                     Sort Method: quicksort  Memory: 25kB
                     Buffers: shared read=4
                     I/O Timings: read=0.429
                     ->  Index Scan using index_deployments_on_project_id_sha on public.deployments  (cost=0.57..3.59 rows=1 width=4) (actual time=0.502..0.503 rows=0 loops=1)
                           Index Cond: ((deployments.project_id = 278964) AND ((deployments.sha)::text = 'dc267aceb79eb167c42438be0807d46df6cba56f'::text))
                           Buffers: shared read=4
                           I/O Timings: read=0.429
         ->  Index Scan using environments_pkey on public.environments  (cost=0.43..3.45 rows=1 width=148) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (environments.id = deployments.environment_id)
               Filter: ((environments.project_id = 278964) AND ((environments.state)::text = 'available'::text))
               Rows Removed by Filter: 0
         SubPlan 2
           ->  Result  (cost=0.63..0.64 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.57..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                         ->  Index Only Scan using index_deployments_on_environment_id_and_id on public.deployments deployments_1  (cost=0.57..39.72 rows=600 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                               Index Cond: ((deployments_1.environment_id = environments.id) AND (deployments_1.id IS NOT NULL))
                               Heap Fetches: 0

Query Up

== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: migrating ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:deployments, [:project_id, :sha], {:name=>"index_deployments_on_project_id_sha", :algorithm=>:concurrently})
   -> 0.0070s
-- execute("SET statement_timeout TO 0")
   -> 0.0001s
-- add_index(:deployments, [:project_id, :sha], {:name=>"index_deployments_on_project_id_sha", :algorithm=>:concurrently})
   -> 0.0037s
-- execute("RESET ALL")
   -> 0.0002s
== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: migrated (0.0114s) ==

Query Down

== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: reverting ===========
-- transaction_open?()
   -> 0.0000s
-- indexes(:deployments)
   -> 0.0092s
-- execute("SET statement_timeout TO 0")
   -> 0.0002s
-- remove_index(:deployments, {:algorithm=>:concurrently, :name=>"index_deployments_on_project_id_sha"})
   -> 0.0017s
-- execute("RESET ALL")
   -> 0.0001s
== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: reverted (0.0116s) ==

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Marc Shaw

Merge request reports