Skip to content

Narrow snippet search scope in GitLab.com

What does this MR do?

This MR sets a more restrictive scope for snippets searches in GitLab.com.

At the moment, we use the following scope to search for snippets:

  • public or internal personal snippets (not attached to any project)
  • public projects where the snippets are public (without the need of having access to the project)
  • projects where the user have access

The number of snippets we have to process is quite high and, therefore, the request times out.

For self-hosted installations with fewer snippets this scope should be fine, but for Gitlab.com is not sustainable.

Only for Gitlab.com we're going to narrow the scope a little bit. From now on, when the user is authenticated the search will look for snippets in:

  • personals snippets authored by the user
  • project snippets where the user is a member of the project

When the user is not authenticated the scope will be:

  • public personal snippets

We are adding as well the flag restricted_snippet_scope_search (enabled by default), in order to disable this new behavior in case something goes wrong.

New queries and times

These are the old and new queries/plans when the user is authenticated:

Original query
SELECT COUNT(*)
FROM
  (SELECT "snippets".*
  FROM "snippets"
  WHERE (snippets.visibility_level IN (10,
                                        20)
          OR snippets.author_id = 1675774)
    AND "snippets"."project_id" IS NULL
  UNION SELECT "snippets".*
  FROM "snippets"
  INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
  INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
  WHERE (snippets.visibility_level IN (10,
                                        20)
          OR snippets.author_id = 1675774)
    AND (projects.visibility_level IN (10,
                                        20))
    AND "project_features"."snippets_access_level" IN (20,
                                                        30)
  UNION SELECT "snippets".*
  FROM "snippets"
  INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
  INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
  WHERE "project_features"."snippets_access_level" IN (20,
                                                        30,
                                                        10)
    AND (EXISTS
            (SELECT 1
            FROM "project_authorizations"
            WHERE (project_id = snippets.project_id)
              AND "project_authorizations"."user_id" = 1675774))) snippets
WHERE "snippets"."content" ILIKE '%foo%'
LIMIT 100
Original Plan
Limit  (cost=153053.18..153053.19 rows=1 width=8) (actual time=32839.224..32839.225 rows=1 loops=1)
   Buffers: shared hit=1717908
   ->  Aggregate  (cost=153053.18..153053.19 rows=1 width=8) (actual time=32839.223..32839.223 rows=1 loops=1)
         Buffers: shared hit=1717908
         ->  HashAggregate  (cost=153048.66..153050.67 rows=201 width=1744) (actual time=32837.848..32839.065 rows=2034 loops=1)
               Group Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, snippets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, snippets.description_html
               Buffers: shared hit=1717908
               ->  Append  (cost=0.42..153041.12 rows=201 width=1744) (actual time=2.447..31286.061 rows=2034 loops=1)
                     Buffers: shared hit=1671765
                     ->  Index Scan using index_snippets_on_project_id on public.snippets  (cost=0.42..70329.44 rows=173 width=1083) (actual time=2.445..19620.180 rows=1857 loops=1)
                           Index Cond: (snippets.project_id IS NULL)
                           Filter: ((snippets.content ~~* '%foo%'::text) AND ((snippets.visibility_level = ANY ('{10,20}'::integer[])) OR (snippets.author_id = 1675774)))
                           Rows Removed by Filter: 142898
                           Buffers: shared hit=216834
                     ->  Nested Loop  (cost=1.75..82389.92 rows=27 width=1083) (actual time=21.955..11663.712 rows=177 loops=1)
                           Buffers: shared hit=1454911
                           ->  Merge Join  (cost=1.32..82366.81 rows=34 width=1087) (actual time=21.921..11660.436 rows=180 loops=1)
                                 Buffers: shared hit=1454190
                                 ->  Index Scan using index_snippets_on_project_id on public.snippets snippets_1  (cost=0.42..84372.64 rows=251 width=1083) (actual time=21.625..10771.529 rows=375 loops=1)
                                       Filter: ((snippets_1.content ~~* '%foo%'::text) AND ((snippets_1.visibility_level = ANY ('{10,20}'::integer[])) OR (snippets_1.author_id = 1675774)))
                                       Rows Removed by Filter: 64103
                                       Buffers: shared hit=126449
                                 ->  Index Only Scan using index_projects_on_id_partial_for_visibility on public.projects  (cost=0.43..52876.23 rows=1442701 width=4) (actual time=0.101..781.427 rows=1418247 loops=1)
                                       Heap Fetches: 198374
                                       Buffers: shared hit=1327741
                           ->  Index Scan using index_project_features_on_project_id on public.project_features  (cost=0.43..0.67 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=180)
                                 Index Cond: (project_features.project_id = projects.id)
                                 Filter: (project_features.snippets_access_level = ANY ('{20,30}'::integer[]))
                                 Rows Removed by Filter: 0
                                 Buffers: shared hit=721
                     ->  Nested Loop  (cost=7.93..319.75 rows=1 width=1083) (actual time=0.130..0.130 rows=0 loops=1)
                           Buffers: shared hit=20
                           ->  Nested Loop  (cost=7.50..319.25 rows=1 width=1091) (actual time=0.129..0.129 rows=0 loops=1)
                                 Buffers: shared hit=20
                                 ->  Nested Loop  (cost=7.08..269.78 rows=104 width=8) (actual time=0.094..0.103 rows=2 loops=1)
                                       Buffers: shared hit=14
                                       ->  HashAggregate  (cost=6.64..7.68 rows=104 width=4) (actual time=0.074..0.076 rows=2 loops=1)
                                             Group Key: project_authorizations.project_id
                                             Buffers: shared hit=6
                                             ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on public.project_authorizations  (cost=0.56..6.38 rows=104 width=4) (actual time=0.055..0.065 rows=2 loops=1)
                                                   Index Cond: (project_authorizations.user_id = 1675774)
                                                   Heap Fetches: 0
                                                   Buffers: shared hit=6
                                       ->  Index Only Scan using projects_pkey on public.projects projects_1  (cost=0.43..2.51 rows=1 width=4) (actual time=0.011..0.011 rows=1 loops=2)
                                             Index Cond: (projects_1.id = project_authorizations.project_id)
                                             Heap Fetches: 0
                                             Buffers: shared hit=8
                                 ->  Index Scan using index_snippets_on_project_id on public.snippets snippets_2  (cost=0.42..0.47 rows=1 width=1083) (actual time=0.011..0.011 rows=0 loops=2)
                                       Index Cond: (snippets_2.project_id = projects_1.id)
                                       Filter: (snippets_2.content ~~* '%foo%'::text)
                                       Rows Removed by Filter: 0
                                       Buffers: shared hit=6
                           ->  Index Scan using index_project_features_on_project_id on public.project_features project_features_1  (cost=0.43..0.49 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                                 Index Cond: (project_features_1.project_id = projects_1.id)
                                 Filter: (project_features_1.snippets_access_level = ANY ('{20,30,10}'::integer[]))
                                 Rows Removed by Filter: 0
Original Execution Times
Cost: 207935.25

Time: 32.885 s
  - planning: 44.964 ms
  - execution: 32.840 s

Shared buffers:
  - hits: 1717908 (~13.10 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
New Query
SELECT COUNT(*)
FROM
  (SELECT "snippets".*
   FROM "snippets"
   WHERE "snippets"."author_id" = 23
     AND "snippets"."project_id" IS NULL
   UNION SELECT "snippets".*
   FROM "snippets"
   INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
   INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
   WHERE "project_features"."snippets_access_level" IN (20,
                                                        30,
                                                        10)
     AND (EXISTS
            (SELECT 1
             FROM "project_authorizations"
             WHERE (project_id = snippets.project_id)
               AND "project_authorizations"."user_id" = 23))) snippets
WHERE "snippets"."content" ILIKE '%foo%'
LIMIT 100
New Plan
Limit  (cost=327.79..327.80 rows=1 width=8) (actual time=0.144..0.144 rows=1 loops=1)
   Buffers: shared hit=16
   ->  Aggregate  (cost=327.79..327.80 rows=1 width=8) (actual time=0.142..0.142 rows=1 loops=1)
         Buffers: shared hit=16
         ->  Unique  (cost=327.69..327.77 rows=2 width=1744) (actual time=0.140..0.140 rows=0 loops=1)
               Buffers: shared hit=16
               ->  Sort  (cost=327.69..327.69 rows=2 width=1744) (actual time=0.140..0.140 rows=0 loops=1)
                     Sort Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, snippets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, snippets.description_html
                     Sort Method: quicksort  Memory: 25kB
                     Buffers: shared hit=16
                     ->  Append  (cost=0.42..327.68 rows=2 width=1744) (actual time=0.082..0.082 rows=0 loops=1)
                           Buffers: shared hit=7
                           ->  Index Scan using index_snippets_on_author_id on public.snippets  (cost=0.42..7.91 rows=1 width=1083) (actual time=0.026..0.026 rows=0 loops=1)
                                 Index Cond: (snippets.author_id = 23)
                                 Filter: ((snippets.project_id IS NULL) AND (snippets.content ~~* '%foo%'::text))
                                 Rows Removed by Filter: 0
                                 Buffers: shared hit=3
                           ->  Nested Loop  (cost=7.93..319.75 rows=1 width=1083) (actual time=0.056..0.056 rows=0 loops=1)
                                 Buffers: shared hit=4
                                 ->  Nested Loop  (cost=7.50..319.25 rows=1 width=1091) (actual time=0.055..0.055 rows=0 loops=1)
                                       Buffers: shared hit=4
                                       ->  Nested Loop  (cost=7.08..269.78 rows=104 width=8) (actual time=0.055..0.056 rows=0 loops=1)
                                             Buffers: shared hit=4
                                             ->  HashAggregate  (cost=6.64..7.68 rows=104 width=4) (actual time=0.055..0.055 rows=0 loops=1)
                                                   Group Key: project_authorizations.project_id
                                                   Buffers: shared hit=4
                                                   ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on public.project_authorizations  (cost=0.56..6.38 rows=104 width=4) (actual time=0.054..0.054 rows=0 loops=1)
                                                         Index Cond: (project_authorizations.user_id = 23)
                                                         Heap Fetches: 0
                                                         Buffers: shared hit=4
                                             ->  Index Only Scan using projects_pkey on public.projects  (cost=0.43..2.51 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                                                   Index Cond: (projects.id = project_authorizations.project_id)
                                                   Heap Fetches: 0
                                       ->  Index Scan using index_snippets_on_project_id on public.snippets snippets_1  (cost=0.42..0.47 rows=1 width=1083) (actual time=0.000..0.000 rows=0 loops=0)
                                             Index Cond: (snippets_1.project_id = projects.id)
                                             Filter: (snippets_1.content ~~* '%foo%'::text)
                                             Rows Removed by Filter: 0
                                 ->  Index Scan using index_project_features_on_project_id on public.project_features  (cost=0.43..0.49 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                                       Index Cond: (project_features.project_id = projects.id)
                                       Filter: (project_features.snippets_access_level = ANY ('{20,30,10}'::integer[]))
                                       Rows Removed by Filter: 0
New Execution Times
Cost: 327.80

Time: 32.353 ms
  - planning: 31.953 ms
  - execution: 0.400 ms

Shared buffers:
  - hits: 16 (~128.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

When the user is not authenticated:

Old Query
SELECT COUNT(*)
FROM
 (SELECT snippets.*
  FROM snippets
  WHERE snippets.visibility_level = 20
    AND snippets.project_id IS NULL
  UNION SELECT snippets.*
  FROM snippets
  INNER JOIN projects ON projects.id = snippets.project_id
  INNER JOIN project_features ON project_features.project_id = projects.id
  WHERE snippets.visibility_level = 20
    AND (projects.visibility_level IN (20))
    AND project_features.snippets_access_level IN (20,
                                                       30)) snippets
WHERE snippets.content ILIKE %foo%
LIMIT 100
Old Plan
Limit  (cost=85851.42..85851.43 rows=1 width=8) (actual time=22951.117..22951.117 rows=1 loops=1)
   Buffers: shared hit=257434 read=99
   ->  Aggregate  (cost=85851.42..85851.43 rows=1 width=8) (actual time=22951.115..22951.115 rows=1 loops=1)
         Buffers: shared hit=257434 read=99
         ->  HashAggregate  (cost=85847.98..85849.51 rows=153 width=1744) (actual time=22950.019..22950.992 rows=1729 loops=1)
               Group Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, snippets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, snippets.description_html
               Buffers: shared hit=257434 read=99
               ->  Append  (cost=0.42..85842.24 rows=153 width=1744) (actual time=1.221..21570.896 rows=1729 loops=1)
                     Buffers: shared hit=214498 read=99
                     ->  Index Scan using index_snippets_on_visibility_level on public.snippets  (cost=0.42..42483.83 rows=134 width=1083) (actual time=1.221..9956.147 rows=1571 loops=1)
                           Index Cond: (snippets.visibility_level = 20)
                           Filter: ((snippets.project_id IS NULL) AND (snippets.content ~~* '%foo%'::text))
                           Rows Removed by Filter: 48952
                           Buffers: shared hit=101990
                     ->  Nested Loop  (cost=1.28..43356.88 rows=19 width=1083) (actual time=16.913..11613.804 rows=158 loops=1)
                           Buffers: shared hit=112508 read=99
                           ->  Nested Loop  (cost=0.85..43340.13 rows=24 width=1087) (actual time=16.883..11610.502 rows=161 loops=1)
                                 Buffers: shared hit=111863 read=99
                                 ->  Index Scan using index_snippets_on_visibility_level on public.snippets snippets_1  (cost=0.42..42483.83 rows=193 width=1083) (actual time=5.978..11383.736 rows=1863 loops=1)
                                       Index Cond: (snippets_1.visibility_level = 20)
                                       Filter: (snippets_1.content ~~* '%foo%'::text)
                                       Rows Removed by Filter: 48660
                                       Buffers: shared hit=110920
                                 ->  Index Scan using index_projects_on_id_partial_for_visibility on public.projects  (cost=0.43..4.43 rows=1 width=4) (actual time=0.120..0.120 rows=0 loops=1863)
                                       Index Cond: (projects.id = snippets_1.project_id)
                                       Filter: (projects.visibility_level = 20)
                                       Rows Removed by Filter: 0
                                       Buffers: shared hit=943 read=99
                           ->  Index Scan using index_project_features_on_project_id on public.project_features  (cost=0.43..0.69 rows=1 width=4) (actual time=0.016..0.017 rows=1 loops=161)
                                 Index Cond: (project_features.project_id = projects.id)
                                 Filter: (project_features.snippets_access_level = ANY ('{20,30}'::integer[]))
                                 Rows Removed by Filter: 0
                                 Buffers: shared hit=645
Old Execution Times
Cost: 85851.43

Time: 22.973 s
  - planning: 21.950 ms
  - execution: 22.951 s
    - I/O read: 216.157 ms

Shared buffers:
  - hits: 257434 (~2.00 GiB) from the buffer pool
  - reads: 99 (~792.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
New Query
SELECT COUNT(*)
FROM snippets
WHERE snippets.visibility_level = 20
  AND snippets.project_id IS NULL
  AND snippets.content ILIKE '%foo%'
LIMIT 100
New Plan
Limit  (cost=42484.16..42484.17 rows=1 width=8) (actual time=9824.331..9824.332 rows=1 loops=1)
   Buffers: shared hit=101990
   ->  Aggregate  (cost=42484.16..42484.17 rows=1 width=8) (actual time=9824.328..9824.329 rows=1 loops=1)
         Buffers: shared hit=101990
         ->  Index Scan using index_snippets_on_visibility_level on public.snippets  (cost=0.42..42483.83 rows=134 width=0) (actual time=1.235..9823.070 rows=1571 loops=1)
               Index Cond: (snippets.visibility_level = 20)
               Filter: ((snippets.project_id IS NULL) AND (snippets.content ~~* '%foo%'::text))
               Rows Removed by Filter: 48952
               Buffers: shared hit=101990
New Execution Times
Cost: 42484.17

Time: 9.835 s
  - planning: 10.201 ms
  - execution: 9.824 s

Shared buffers:
  - hits: 101990 (~796.80 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Closes #26123 (closed)

Edited by 🤖 GitLab Bot 🤖

Merge request reports