Skip to content

Prioritize exact matches on project search

Jonathan Schafer requested to merge 235676-prioritize-exact-matches into master

What does this MR do?

This MR prioritizes exact matches for #235676 (closed).

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Database Query

With similarity sort

Time: 681.251 ms

EXPLAIN SELECT “projects”.* FROM “projects” INNER JOIN “project_authorizations” ON “projects”.“id” = “project_authorizations”.“project_id” INNER JOIN “routes” ON “routes”.“source_type” = ‘Project’ AND “routes”.“source_id” = “projects”.“id” WHERE “project_authorizations”.“user_id” = 4473655 AND ((“routes”.“path” ILIKE ‘%gitlab-org/gitlab%’ OR “routes”.“name” ILIKE ‘%gitlab-org/gitlab%‘) OR “projects”.“description” ILIKE ‘%gitlab-org/gitlab%‘) AND “projects”.“pending_delete” = false ORDER BY (/* gitlab/database/similarity_score */ SIMILARITY(COALESCE(“projects”.“path”, ‘’), ‘gitlab-org/gitlab’) * CAST(‘1’ AS numeric)) + (/* gitlab/database/similarity_score */ SIMILARITY(COALESCE(“projects”.“name”, ‘’), ‘gitlab-org/gitlab’) * CAST(‘0.7’ AS numeric)) + (/* gitlab/database/similarity_score */ SIMILARITY(COALESCE(“projects”.“description”, ‘’), ‘gitlab-org/gitlab’) * CAST(‘0.2’ AS numeric)) DESC, “projects”.“id” DESC LIMIT 20

Plan with execution:

 Limit  (cost=10803.06..10803.06 rows=1 width=733) (actual time=92.755..92.760 rows=20 loops=1)
   Buffers: shared hit=33603
   ->  Sort  (cost=10803.06..10803.06 rows=1 width=733) (actual time=92.753..92.755 rows=20 loops=1)
         Sort Key: ((((similarity((COALESCE(projects.path, ''::character varying))::text, 'gitlab-org/gitlab'::text) * '1'::double precision) + (similarity((COALESCE(projects.name, ''::character varying))::text, 'gitlab-org/gitlab'::text) * '0.7'::double precision)) + (similarity(COALESCE(projects.description, ''::text), 'gitlab-org/gitlab'::text) * '0.2'::double precision))) DESC, projects.id DESC
         Sort Method: top-N heapsort  Memory: 58kB
         Buffers: shared hit=33603
         ->  Nested Loop  (cost=1.57..10803.05 rows=1 width=733) (actual time=0.887..92.185 rows=121 loops=1)
               Buffers: shared hit=33603
               ->  Nested Loop  (cost=1.01..9184.70 rows=2519 width=729) (actual time=0.046..28.113 rows=3538 loops=1)
                     Buffers: shared hit=15885
                     ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on public.project_authorizations  (cost=0.57..595.85 rows=2519 width=4) (actual time=0.027..2.790 rows=3540 loops=1)
                           Index Cond: (project_authorizations.user_id = 4473655)
                           Heap Fetches: 831
                           Buffers: shared hit=1722
                     ->  Index Scan using projects_pkey on public.projects  (cost=0.43..3.41 rows=1 width=725) (actual time=0.006..0.006 rows=1 loops=3540)
                           Index Cond: (projects.id = project_authorizations.project_id)
                           Filter: (NOT projects.pending_delete)
                           Rows Removed by Filter: 0
                           Buffers: shared hit=14163
               ->  Index Scan using index_routes_on_source_type_and_source_id on public.routes  (cost=0.56..0.63 rows=1 width=52) (actual time=0.016..0.016 rows=0 loops=3538)
                     Index Cond: (((routes.source_type)::text = 'Project'::text) AND (routes.source_id = projects.id))
                     Filter: (((routes.path)::text ~~* '%gitlab-org/gitlab%'::text) OR ((routes.name)::text ~~* '%gitlab-org/gitlab%'::text) OR (projects.description ~~* '%gitlab-org/gitlab%'::text))
                     Rows Removed by Filter: 1
                     Buffers: shared hit=17718

Summary:

Time: 681.251 ms
  - planning: 14.537 ms
  - execution: 666.714 ms
    - I/O read: 525.132 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 19930 (~155.70 MiB) from the buffer pool
  - reads: 13679 (~106.90 MiB) from the OS file cache, including disk I/O
  - dirtied: 462 (~3.60 MiB)
  - writes: 0

Without similarity sort

Time: 269.743 ms

EXPLAIN SELECT “projects”.* FROM “projects” INNER JOIN “project_authorizations” ON “projects”.“id” = “project_authorizations”.“project_id” INNER JOIN “routes” ON “routes”.“source_type” = ‘Project’ AND “routes”.“source_id” = “projects”.“id” WHERE “project_authorizations”.“user_id” = 4473655 AND ((“routes”.“path” ILIKE ‘%gitlab-org/gitlab%’ OR “routes”.“name” ILIKE ‘%gitlab-org/gitlab%‘) OR “projects”.“description” ILIKE ‘%gitlab-org/gitlab%’) AND “projects”.“pending_delete” = false ORDER BY “projects”.“id” DESC LIMIT 20

Plan with execution

 Limit  (cost=1.57..10803.03 rows=1 width=725) (actual time=11.814..252.311 rows=20 loops=1)
   Buffers: shared hit=4285 read=3304 dirtied=135
   I/O Timings: read=211.069
   ->  Nested Loop  (cost=1.57..10803.03 rows=1 width=725) (actual time=11.813..252.292 rows=20 loops=1)
         Buffers: shared hit=4285 read=3304 dirtied=135
         I/O Timings: read=211.069
         ->  Nested Loop  (cost=1.01..9184.70 rows=2519 width=729) (actual time=0.791..161.352 rows=757 loops=1)
               Buffers: shared hit=1890 read=1904 dirtied=71
               I/O Timings: read=144.766
               ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on public.project_authorizations  (cost=0.57..595.85 rows=2519 width=4) (actual time=0.528..65.284 rows=757 loops=1)
                     Index Cond: (project_authorizations.user_id = 4473655)
                     Heap Fetches: 471
                     Buffers: shared hit=247 read=519 dirtied=24
                     I/O Timings: read=61.951
               ->  Index Scan using projects_pkey on public.projects  (cost=0.43..3.41 rows=1 width=725) (actual time=0.124..0.124 rows=1 loops=757)
                     Index Cond: (projects.id = project_authorizations.project_id)
                     Filter: (NOT projects.pending_delete)
                     Rows Removed by Filter: 0
                     Buffers: shared hit=1643 read=1385 dirtied=47
                     I/O Timings: read=82.815
         ->  Index Scan using index_routes_on_source_type_and_source_id on public.routes  (cost=0.56..0.63 rows=1 width=52) (actual time=0.118..0.118 rows=0 loops=757)
               Index Cond: (((routes.source_type)::text = 'Project'::text) AND (routes.source_id = projects.id))
               Filter: (((routes.path)::text ~~* '%gitlab-org/gitlab%'::text) OR ((routes.name)::text ~~* '%gitlab-org/gitlab%'::text) OR (projects.description ~~* '%gitlab-org/gitlab%'::text))
               Rows Removed by Filter: 1
               Buffers: shared hit=2395 read=1400 dirtied=64
               I/O Timings: read=66.303

Summary

Time: 269.743 ms
  - planning: 17.257 ms
  - execution: 252.486 ms
    - I/O read: 211.069 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4285 (~33.50 MiB) from the buffer pool
  - reads: 3304 (~25.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 135 (~1.10 MiB)
  - writes: 0
Edited by Jonathan Schafer

Merge request reports