Narrow snippet search scope in GitLab.com
What does this MR do?
This MR sets a more restrictive scope for snippets searches in GitLab.com
.
At the moment, we use the following scope to search for snippets:
- public or internal personal snippets (not attached to any project)
- public projects where the snippets are public (without the need of having access to the project)
- projects where the user have access
The number of snippets we have to process is quite high and, therefore, the request times out.
For self-hosted installations with fewer snippets this scope should be fine, but for Gitlab.com
is not sustainable.
Only for Gitlab.com
we're going to narrow the scope a little bit. From now on, when the user is authenticated the search will look for snippets in:
- personals snippets authored by the user
- project snippets where the user is a member of the project
When the user is not authenticated the scope will be:
- public personal snippets
We are adding as well the flag restricted_snippet_scope_search
(enabled by default), in order to disable this new behavior in case something goes wrong.
New queries and times
These are the old and new queries/plans when the user is authenticated:
Original query
SELECT COUNT(*)
FROM
(SELECT "snippets".*
FROM "snippets"
WHERE (snippets.visibility_level IN (10,
20)
OR snippets.author_id = 1675774)
AND "snippets"."project_id" IS NULL
UNION SELECT "snippets".*
FROM "snippets"
INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE (snippets.visibility_level IN (10,
20)
OR snippets.author_id = 1675774)
AND (projects.visibility_level IN (10,
20))
AND "project_features"."snippets_access_level" IN (20,
30)
UNION SELECT "snippets".*
FROM "snippets"
INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE "project_features"."snippets_access_level" IN (20,
30,
10)
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE (project_id = snippets.project_id)
AND "project_authorizations"."user_id" = 1675774))) snippets
WHERE "snippets"."content" ILIKE '%foo%'
LIMIT 100
Original Plan
Limit (cost=153053.18..153053.19 rows=1 width=8) (actual time=32839.224..32839.225 rows=1 loops=1)
Buffers: shared hit=1717908
-> Aggregate (cost=153053.18..153053.19 rows=1 width=8) (actual time=32839.223..32839.223 rows=1 loops=1)
Buffers: shared hit=1717908
-> HashAggregate (cost=153048.66..153050.67 rows=201 width=1744) (actual time=32837.848..32839.065 rows=2034 loops=1)
Group Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, snippets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, snippets.description_html
Buffers: shared hit=1717908
-> Append (cost=0.42..153041.12 rows=201 width=1744) (actual time=2.447..31286.061 rows=2034 loops=1)
Buffers: shared hit=1671765
-> Index Scan using index_snippets_on_project_id on public.snippets (cost=0.42..70329.44 rows=173 width=1083) (actual time=2.445..19620.180 rows=1857 loops=1)
Index Cond: (snippets.project_id IS NULL)
Filter: ((snippets.content ~~* '%foo%'::text) AND ((snippets.visibility_level = ANY ('{10,20}'::integer[])) OR (snippets.author_id = 1675774)))
Rows Removed by Filter: 142898
Buffers: shared hit=216834
-> Nested Loop (cost=1.75..82389.92 rows=27 width=1083) (actual time=21.955..11663.712 rows=177 loops=1)
Buffers: shared hit=1454911
-> Merge Join (cost=1.32..82366.81 rows=34 width=1087) (actual time=21.921..11660.436 rows=180 loops=1)
Buffers: shared hit=1454190
-> Index Scan using index_snippets_on_project_id on public.snippets snippets_1 (cost=0.42..84372.64 rows=251 width=1083) (actual time=21.625..10771.529 rows=375 loops=1)
Filter: ((snippets_1.content ~~* '%foo%'::text) AND ((snippets_1.visibility_level = ANY ('{10,20}'::integer[])) OR (snippets_1.author_id = 1675774)))
Rows Removed by Filter: 64103
Buffers: shared hit=126449
-> Index Only Scan using index_projects_on_id_partial_for_visibility on public.projects (cost=0.43..52876.23 rows=1442701 width=4) (actual time=0.101..781.427 rows=1418247 loops=1)
Heap Fetches: 198374
Buffers: shared hit=1327741
-> Index Scan using index_project_features_on_project_id on public.project_features (cost=0.43..0.67 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=180)
Index Cond: (project_features.project_id = projects.id)
Filter: (project_features.snippets_access_level = ANY ('{20,30}'::integer[]))
Rows Removed by Filter: 0
Buffers: shared hit=721
-> Nested Loop (cost=7.93..319.75 rows=1 width=1083) (actual time=0.130..0.130 rows=0 loops=1)
Buffers: shared hit=20
-> Nested Loop (cost=7.50..319.25 rows=1 width=1091) (actual time=0.129..0.129 rows=0 loops=1)
Buffers: shared hit=20
-> Nested Loop (cost=7.08..269.78 rows=104 width=8) (actual time=0.094..0.103 rows=2 loops=1)
Buffers: shared hit=14
-> HashAggregate (cost=6.64..7.68 rows=104 width=4) (actual time=0.074..0.076 rows=2 loops=1)
Group Key: project_authorizations.project_id
Buffers: shared hit=6
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on public.project_authorizations (cost=0.56..6.38 rows=104 width=4) (actual time=0.055..0.065 rows=2 loops=1)
Index Cond: (project_authorizations.user_id = 1675774)
Heap Fetches: 0
Buffers: shared hit=6
-> Index Only Scan using projects_pkey on public.projects projects_1 (cost=0.43..2.51 rows=1 width=4) (actual time=0.011..0.011 rows=1 loops=2)
Index Cond: (projects_1.id = project_authorizations.project_id)
Heap Fetches: 0
Buffers: shared hit=8
-> Index Scan using index_snippets_on_project_id on public.snippets snippets_2 (cost=0.42..0.47 rows=1 width=1083) (actual time=0.011..0.011 rows=0 loops=2)
Index Cond: (snippets_2.project_id = projects_1.id)
Filter: (snippets_2.content ~~* '%foo%'::text)
Rows Removed by Filter: 0
Buffers: shared hit=6
-> Index Scan using index_project_features_on_project_id on public.project_features project_features_1 (cost=0.43..0.49 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (project_features_1.project_id = projects_1.id)
Filter: (project_features_1.snippets_access_level = ANY ('{20,30,10}'::integer[]))
Rows Removed by Filter: 0
Original Execution Times
Cost: 207935.25
Time: 32.885 s
- planning: 44.964 ms
- execution: 32.840 s
Shared buffers:
- hits: 1717908 (~13.10 GiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
New Query
SELECT COUNT(*)
FROM
(SELECT "snippets".*
FROM "snippets"
WHERE "snippets"."author_id" = 23
AND "snippets"."project_id" IS NULL
UNION SELECT "snippets".*
FROM "snippets"
INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE "project_features"."snippets_access_level" IN (20,
30,
10)
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE (project_id = snippets.project_id)
AND "project_authorizations"."user_id" = 23))) snippets
WHERE "snippets"."content" ILIKE '%foo%'
LIMIT 100
New Plan
Limit (cost=327.79..327.80 rows=1 width=8) (actual time=0.144..0.144 rows=1 loops=1)
Buffers: shared hit=16
-> Aggregate (cost=327.79..327.80 rows=1 width=8) (actual time=0.142..0.142 rows=1 loops=1)
Buffers: shared hit=16
-> Unique (cost=327.69..327.77 rows=2 width=1744) (actual time=0.140..0.140 rows=0 loops=1)
Buffers: shared hit=16
-> Sort (cost=327.69..327.69 rows=2 width=1744) (actual time=0.140..0.140 rows=0 loops=1)
Sort Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, snippets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, snippets.description_html
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=16
-> Append (cost=0.42..327.68 rows=2 width=1744) (actual time=0.082..0.082 rows=0 loops=1)
Buffers: shared hit=7
-> Index Scan using index_snippets_on_author_id on public.snippets (cost=0.42..7.91 rows=1 width=1083) (actual time=0.026..0.026 rows=0 loops=1)
Index Cond: (snippets.author_id = 23)
Filter: ((snippets.project_id IS NULL) AND (snippets.content ~~* '%foo%'::text))
Rows Removed by Filter: 0
Buffers: shared hit=3
-> Nested Loop (cost=7.93..319.75 rows=1 width=1083) (actual time=0.056..0.056 rows=0 loops=1)
Buffers: shared hit=4
-> Nested Loop (cost=7.50..319.25 rows=1 width=1091) (actual time=0.055..0.055 rows=0 loops=1)
Buffers: shared hit=4
-> Nested Loop (cost=7.08..269.78 rows=104 width=8) (actual time=0.055..0.056 rows=0 loops=1)
Buffers: shared hit=4
-> HashAggregate (cost=6.64..7.68 rows=104 width=4) (actual time=0.055..0.055 rows=0 loops=1)
Group Key: project_authorizations.project_id
Buffers: shared hit=4
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on public.project_authorizations (cost=0.56..6.38 rows=104 width=4) (actual time=0.054..0.054 rows=0 loops=1)
Index Cond: (project_authorizations.user_id = 23)
Heap Fetches: 0
Buffers: shared hit=4
-> Index Only Scan using projects_pkey on public.projects (cost=0.43..2.51 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (projects.id = project_authorizations.project_id)
Heap Fetches: 0
-> Index Scan using index_snippets_on_project_id on public.snippets snippets_1 (cost=0.42..0.47 rows=1 width=1083) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (snippets_1.project_id = projects.id)
Filter: (snippets_1.content ~~* '%foo%'::text)
Rows Removed by Filter: 0
-> Index Scan using index_project_features_on_project_id on public.project_features (cost=0.43..0.49 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (project_features.project_id = projects.id)
Filter: (project_features.snippets_access_level = ANY ('{20,30,10}'::integer[]))
Rows Removed by Filter: 0
New Execution Times
Cost: 327.80
Time: 32.353 ms
- planning: 31.953 ms
- execution: 0.400 ms
Shared buffers:
- hits: 16 (~128.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
When the user is not authenticated:
Old Query
SELECT COUNT(*)
FROM
(SELECT “snippets”.*
FROM “snippets”
WHERE “snippets”.“visibility_level” = 20
AND “snippets”.“project_id” IS NULL
UNION SELECT “snippets”.*
FROM “snippets”
INNER JOIN “projects” ON “projects”.“id” = “snippets”.“project_id”
INNER JOIN “project_features” ON “project_features”.“project_id” = “projects”.“id”
WHERE “snippets”.“visibility_level” = 20
AND (projects.visibility_level IN (20))
AND “project_features”.“snippets_access_level” IN (20,
30)) snippets
WHERE “snippets”.“content” ILIKE ‘%foo%’
LIMIT 100
Old Plan
Limit (cost=85851.42..85851.43 rows=1 width=8) (actual time=22951.117..22951.117 rows=1 loops=1)
Buffers: shared hit=257434 read=99
-> Aggregate (cost=85851.42..85851.43 rows=1 width=8) (actual time=22951.115..22951.115 rows=1 loops=1)
Buffers: shared hit=257434 read=99
-> HashAggregate (cost=85847.98..85849.51 rows=153 width=1744) (actual time=22950.019..22950.992 rows=1729 loops=1)
Group Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, snippets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, snippets.description_html
Buffers: shared hit=257434 read=99
-> Append (cost=0.42..85842.24 rows=153 width=1744) (actual time=1.221..21570.896 rows=1729 loops=1)
Buffers: shared hit=214498 read=99
-> Index Scan using index_snippets_on_visibility_level on public.snippets (cost=0.42..42483.83 rows=134 width=1083) (actual time=1.221..9956.147 rows=1571 loops=1)
Index Cond: (snippets.visibility_level = 20)
Filter: ((snippets.project_id IS NULL) AND (snippets.content ~~* '%foo%'::text))
Rows Removed by Filter: 48952
Buffers: shared hit=101990
-> Nested Loop (cost=1.28..43356.88 rows=19 width=1083) (actual time=16.913..11613.804 rows=158 loops=1)
Buffers: shared hit=112508 read=99
-> Nested Loop (cost=0.85..43340.13 rows=24 width=1087) (actual time=16.883..11610.502 rows=161 loops=1)
Buffers: shared hit=111863 read=99
-> Index Scan using index_snippets_on_visibility_level on public.snippets snippets_1 (cost=0.42..42483.83 rows=193 width=1083) (actual time=5.978..11383.736 rows=1863 loops=1)
Index Cond: (snippets_1.visibility_level = 20)
Filter: (snippets_1.content ~~* '%foo%'::text)
Rows Removed by Filter: 48660
Buffers: shared hit=110920
-> Index Scan using index_projects_on_id_partial_for_visibility on public.projects (cost=0.43..4.43 rows=1 width=4) (actual time=0.120..0.120 rows=0 loops=1863)
Index Cond: (projects.id = snippets_1.project_id)
Filter: (projects.visibility_level = 20)
Rows Removed by Filter: 0
Buffers: shared hit=943 read=99
-> Index Scan using index_project_features_on_project_id on public.project_features (cost=0.43..0.69 rows=1 width=4) (actual time=0.016..0.017 rows=1 loops=161)
Index Cond: (project_features.project_id = projects.id)
Filter: (project_features.snippets_access_level = ANY ('{20,30}'::integer[]))
Rows Removed by Filter: 0
Buffers: shared hit=645
Old Execution Times
Cost: 85851.43
Time: 22.973 s
- planning: 21.950 ms
- execution: 22.951 s
- I/O read: 216.157 ms
Shared buffers:
- hits: 257434 (~2.00 GiB) from the buffer pool
- reads: 99 (~792.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
New Query
SELECT COUNT(*)
FROM “snippets”
WHERE “snippets”.“visibility_level” = 20
AND “snippets”.“project_id” IS NULL
AND “snippets”.“content” ILIKE '%foo%'
LIMIT 100
New Plan
Limit (cost=42484.16..42484.17 rows=1 width=8) (actual time=9824.331..9824.332 rows=1 loops=1)
Buffers: shared hit=101990
-> Aggregate (cost=42484.16..42484.17 rows=1 width=8) (actual time=9824.328..9824.329 rows=1 loops=1)
Buffers: shared hit=101990
-> Index Scan using index_snippets_on_visibility_level on public.snippets (cost=0.42..42483.83 rows=134 width=0) (actual time=1.235..9823.070 rows=1571 loops=1)
Index Cond: (snippets.visibility_level = 20)
Filter: ((snippets.project_id IS NULL) AND (snippets.content ~~* '%foo%'::text))
Rows Removed by Filter: 48952
Buffers: shared hit=101990
New Execution Times
Cost: 42484.17
Time: 9.835 s
- planning: 10.201 ms
- execution: 9.824 s
Shared buffers:
- hits: 101990 (~796.80 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Separation of EE specific content
Closes #26123 (closed)