Add index for sha to deployments
What does this MR do?
Adds an index to the sha column for deployments
When looking for environments on projects with a large number of deployments (ie our main gitlab repo, but not other repos), the query is rather slow.
A particular page this effects is the commits show page on the main gitlab repo, ie:
We can see the query takes 700 MS
The place this query is called: https://gitlab.com/gitlab-org/gitlab/-/blob/21d3afd31104179a1b7496a67bb31d605ed9951a/app/finders/environments_finder.rb
The query in question:
SELECT "environments".* FROM "environments" WHERE "environments"."project_id" = 278964 AND ("environments"."state" IN ('available')) AND "environments"."id" IN (SELECT "deployments"."environment_id" FROM "deployments" WHERE "deployments"."project_id" = 278964 AND "deployments"."sha" = 'dc267aceb79eb167c42438be0807d46df6cba56f' GROUP BY "deployments"."environment_id") ORDER BY (SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."environment_id" = "environments"."id") DESC NULLS LAST
Looking at this query with/without this added index, with cold cache we get:
Without new index
Time: 3.150 min
- planning: 0.936 ms
- execution: 3.149 min
- I/O read: 3.111 min
- I/O write: 0.000 ms
Shared buffers:
- hits: 14266 (~111.50 MiB) from the buffer pool
- reads: 160831 (~1.20 GiB) from the OS file cache, including disk I/O
- dirtied: 2991 (~23.40 MiB)
- writes: 0
Sort (cost=272623.03..272623.04 rows=1 width=152) (actual time=188969.308..188969.308 rows=0 loops=1)
Sort Key: ((SubPlan 2)) DESC NULLS LAST
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=14266 read=160831 dirtied=2991
I/O Timings: read=186653.389
-> Nested Loop (cost=272619.35..272623.02 rows=1 width=152) (actual time=188969.251..188969.251 rows=0 loops=1)
Buffers: shared hit=14263 read=160831 dirtied=2991
I/O Timings: read=186653.389
-> Group (cost=272618.92..272618.93 rows=1 width=4) (actual time=188969.249..188969.249 rows=0 loops=1)
Group Key: deployments.environment_id
Buffers: shared hit=14263 read=160831 dirtied=2991
I/O Timings: read=186653.389
-> Sort (cost=272618.92..272618.93 rows=1 width=4) (actual time=188969.246..188969.247 rows=0 loops=1)
Sort Key: deployments.environment_id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=14263 read=160831 dirtied=2991
I/O Timings: read=186653.389
-> Index Scan using index_deployments_on_project_id_and_id on public.deployments (cost=0.57..272618.91 rows=1 width=4) (actual time=188969.233..188969.233 rows=0 loops=1)
Index Cond: (deployments.project_id = 278964)
Filter: ((deployments.sha)::text = 'dc267aceb79eb167c42438be0807d46df6cba56f'::text)
Rows Removed by Filter: 211871
Buffers: shared hit=14263 read=160831 dirtied=2991
I/O Timings: read=186653.389
-> Index Scan using environments_pkey on public.environments (cost=0.43..3.45 rows=1 width=148) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (environments.id = deployments.environment_id)
Filter: ((environments.project_id = 278964) AND ((environments.state)::text = 'available'::text))
Rows Removed by Filter: 0
SubPlan 2
-> Result (cost=0.62..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
InitPlan 1 (returns $1)
-> Limit (cost=0.57..0.62 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
-> Index Only Scan using index_deployments_on_environment_id_and_id on public.deployments deployments_1 (cost=0.57..33.25 rows=600 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: ((deployments_1.environment_id = environments.id) AND (deployments_1.id IS NOT NULL))
Heap Fetches: 0
With new index
Size of Index
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------------------------------+-------+--------+-------------+---------+-------------
public | index_deployments_on_project_id_sha | index | gitlab | deployments | 5965 MB |
(1 row)
Time taken to create index
The query has been executed. Duration: 9.319 min
Time: 1.696 ms
- planning: 1.082 ms
- execution: 0.614 ms
- I/O read: 0.429 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 0 from the buffer pool
- reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Sort (cost=7.72..7.72 rows=1 width=152) (actual time=0.531..0.531 rows=0 loops=1)
Sort Key: ((SubPlan 2)) DESC NULLS LAST
Sort Method: quicksort Memory: 25kB
Buffers: shared read=4
I/O Timings: read=0.429
-> Nested Loop (cost=4.02..7.71 rows=1 width=152) (actual time=0.516..0.516 rows=0 loops=1)
Buffers: shared read=4
I/O Timings: read=0.429
-> Group (cost=3.60..3.60 rows=1 width=4) (actual time=0.516..0.516 rows=0 loops=1)
Group Key: deployments.environment_id
Buffers: shared read=4
I/O Timings: read=0.429
-> Sort (cost=3.60..3.60 rows=1 width=4) (actual time=0.515..0.515 rows=0 loops=1)
Sort Key: deployments.environment_id
Sort Method: quicksort Memory: 25kB
Buffers: shared read=4
I/O Timings: read=0.429
-> Index Scan using index_deployments_on_project_id_sha on public.deployments (cost=0.57..3.59 rows=1 width=4) (actual time=0.502..0.503 rows=0 loops=1)
Index Cond: ((deployments.project_id = 278964) AND ((deployments.sha)::text = 'dc267aceb79eb167c42438be0807d46df6cba56f'::text))
Buffers: shared read=4
I/O Timings: read=0.429
-> Index Scan using environments_pkey on public.environments (cost=0.43..3.45 rows=1 width=148) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (environments.id = deployments.environment_id)
Filter: ((environments.project_id = 278964) AND ((environments.state)::text = 'available'::text))
Rows Removed by Filter: 0
SubPlan 2
-> Result (cost=0.63..0.64 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
InitPlan 1 (returns $1)
-> Limit (cost=0.57..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
-> Index Only Scan using index_deployments_on_environment_id_and_id on public.deployments deployments_1 (cost=0.57..39.72 rows=600 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: ((deployments_1.environment_id = environments.id) AND (deployments_1.id IS NOT NULL))
Heap Fetches: 0
Query Up
== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: migrating ===========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:deployments, [:project_id, :sha], {:name=>"index_deployments_on_project_id_sha", :algorithm=>:concurrently})
-> 0.0070s
-- execute("SET statement_timeout TO 0")
-> 0.0001s
-- add_index(:deployments, [:project_id, :sha], {:name=>"index_deployments_on_project_id_sha", :algorithm=>:concurrently})
-> 0.0037s
-- execute("RESET ALL")
-> 0.0002s
== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: migrated (0.0114s) ==
Query Down
== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: reverting ===========
-- transaction_open?()
-> 0.0000s
-- indexes(:deployments)
-> 0.0092s
-- execute("SET statement_timeout TO 0")
-> 0.0002s
-- remove_index(:deployments, {:algorithm=>:concurrently, :name=>"index_deployments_on_project_id_sha"})
-> 0.0017s
-- execute("RESET ALL")
-> 0.0001s
== 20200930131343 AddIndexOnProjectIdAndShaToDeployments: reverted (0.0116s) ==
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team