Skip to content

Similarity sorting in group projects API

Adam Hegyi requested to merge rudimentary-similarity-sorting into master

What does this MR do?

Related issue: #221043 (closed)

This MR introduces similarity search based on the incoming search parameter for the group projects API endpoint. The goal is to show better matches on top when using the projects dropdown.

Note: The change is behind a feature flag. I plan to use this feature in a follow up issue for the project dropdown at https://gitlab.com/groups/gitlab-org/-/analytics/value_stream_analytics

Query

Current: orders by last_activity_at DESC

API endpoint: api/v4/groups/9970/projects.json?search=git&per_page=50&with_shared=false&order_by=last_activity_at&include_subgroups=true

SELECT "projects".*
FROM   "projects"
WHERE  "projects"."namespace_id" IN (WITH recursive "base_and_descendants" AS (
                                     (
                                            SELECT "namespaces".*
                                            FROM   "namespaces"
                                            WHERE  "namespaces"."type" = 'Group'
                                            AND    "namespaces"."id" = 9970)
                              UNION
                                    (
                                           SELECT "namespaces".*
                                           FROM   "namespaces",
                                                  "base_and_descendants"
                                           WHERE  "namespaces"."type" = 'Group'
                                           AND    "namespaces"."parent_id" = "base_and_descendants"."id"))SELECT   "id"
                               FROM     "base_and_descendants" AS "namespaces")
AND      (
                  EXISTS
                  (
                         SELECT 1
                         FROM   "project_authorizations"
                         WHERE  "project_authorizations"."user_id" = 4156052
                         AND    (
                                       project_authorizations.project_id = projects.id))
         OR       projects.visibility_level IN (0,10,20))
AND      ((
                           "projects"."path" ilike '%git%'
                  OR       "projects"."name" ilike '%git%')
         OR       "projects"."description" ilike '%git%')
AND      "projects"."pending_delete" = false
ORDER BY "projects"."last_activity_at" DESC,
         "projects"."id" DESC limit 50 offset 0

Plan

Top 10 results:

         name          |         path
-----------------------+-----------------------
 gitter-android-app    | gitter-android-app
 GitLab                | gitlab
 gitaly                | gitaly
 gitlab-runner         | gitlab-runner
 Pajamas Design System | design.gitlab.com
 GitLab UX Research    | ux-research
 triage-ops            | triage-ops
 customers-gitlab-com  | customers-gitlab-com
 omnibus-gitlab-mirror | omnibus-gitlab-mirror
 omnibus-gitlab        | omnibus-gitlab

New: Order by similarity DESC

The only difference is the ORDER BY expression.

API endpoint: api/v4/groups/9970/projects.json?search=git&per_page=50&with_shared=false&order_by=similarity&include_subgroups=true

 SELECT "projects".*
FROM   "projects"
WHERE  "projects"."namespace_id" IN (WITH recursive "base_and_descendants" AS (
                                     (
                                            SELECT "namespaces".*
                                            FROM   "namespaces"
                                            WHERE  "namespaces"."type" = 'Group'
                                            AND    "namespaces"."id" = 9970)
                              UNION
                                    (
                                           SELECT "namespaces".*
                                           FROM   "namespaces",
                                                  "base_and_descendants"
                                           WHERE  "namespaces"."type" = 'Group'
                                           AND    "namespaces"."parent_id" = "base_and_descendants"."id"))SELECT   "id"
                               FROM     "base_and_descendants" AS "namespaces")
AND      (
                  EXISTS
                  (
                         SELECT 1
                         FROM   "project_authorizations"
                         WHERE  "project_authorizations"."user_id" = 4156052
                         AND    (
                                       project_authorizations.project_id = projects.id))
         OR       projects.visibility_level IN (0,10,20))
AND      ((
                           "projects"."path" ilike '%git%'
                  OR       "projects"."name" ilike '%git%')
         OR       "projects"."description" ilike '%git%')
AND      "projects"."pending_delete" = false
ORDER BY (similarity(COALESCE("projects"."path", ''), 'git') * cast('1' AS numeric)) + (similarity(COALESCE("projects"."name", ''), 'git') * cast('0.7' AS numeric)) + (similarity(COALESCE("projects"."description", ''), 'git') * cast('0.2' AS numeric)) DESC,
         "projects"."id" DESC limit 50 offset 0

Plan - A few ms overhead because of the similarity calculation.

Top 10 results:

      name       |      path
-----------------+-----------------
 Git             | git
 GitLab-Git      | gitlab-git
 gitlab_git      | gitlab_git
 -----           | -----
 -----           | -----
 gitaly          | gitaly
 GitLab          | gitlab
 gitlab-grit     | gitlab-grit
 gitlab-git-test | gitlab-git-test
 GitLab Chart    | gitlab

Removed 2 private projects from the list.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Adam Hegyi

Merge request reports