Skip to content

Add similarity sort to Groups API search

What does this MR do?

Related to #332889 (closed)

Adds a similarity sort capability to GET /groups when search parameter is provided. I took inspiration from !37300 (merged) to implement this.

I plan to use this in a future MR to apply similarity sort to the Search page's Groups drop down to help users find the groups they are searching for easier.

The main goals are to:

  • have exact matches return first
  • return top level groups before subgroups

Database

Run as Admin

Taken from master branch

Before plan: https://explain.depesz.com/s/oBUO

Before SQL
SELECT
    "namespaces".*
FROM
    "namespaces"
WHERE
    "namespaces"."type" = 'Group'
    AND "namespaces"."id" IN (
        SELECT
            "routes"."source_id"
        FROM
            "routes"
        WHERE
            "routes"."source_type" = 'Namespace'
            AND ("routes"."path" ILIKE '%gitlab.org%'
                OR "routes"."name" ILIKE '%gitlab.org%'))
ORDER BY
    "namespaces"."name" ASC,
    "namespaces"."id" ASC
LIMIT 20 OFFSET 0

After plan: https://explain.depesz.com/s/bzR

After SQL
SELECT
    "namespaces".*,
    ROUND(CAST((( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("namespaces"."path", ''), 'gitlab') * CAST('1' AS numeric)) + (/* gitlab/database/similarity_score */ SIMILARITY(COALESCE("namespaces"."name", ''), 'gitlab') * CAST('0.7' AS numeric))) AS numeric), 2) AS similarity,
    "namespaces"."parent_id" AS parent_id,
    "namespaces"."id" AS id
FROM
    "namespaces"
WHERE
    "namespaces"."type" = 'Group'
    AND "namespaces"."id" IN (
        SELECT
            "routes"."source_id"
        FROM
            "routes"
        WHERE
            "routes"."source_type" = 'Namespace'
            AND ("routes"."path" ILIKE '%gitlab.org%'
                OR "routes"."name" ILIKE '%gitlab.org%'))
ORDER BY
    ROUND(CAST((( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("namespaces"."path", ''), 'gitlab.org') * CAST('1' AS numeric)) + (/* gitlab/database/similarity_score */ SIMILARITY(COALESCE("namespaces"."name", ''), 'gitlab.org') * CAST('0.7' AS numeric))) AS numeric), 2) DESC,
    namespaces.parent_id DESC NULLS LAST,
    "namespaces"."id" DESC
LIMIT 20 OFFSET 0

Run as non-Admin (developer role)

Taken from GitLab.com, search page, group dropdown, performance bar

Before Plan: https://explain.depesz.com/s/QZOB

Before SQL
WITH RECURSIVE "base_and_ancestors" AS ((
        SELECT
            "namespaces".*
        FROM (( WITH "direct_groups" AS MATERIALIZED (
                    SELECT
                        "namespaces".*
                    FROM ((
                            SELECT
                                "namespaces".*
                            FROM
                                "namespaces"
                                INNER JOIN "members" ON "namespaces"."id" = "members"."source_id"
                            WHERE
                                "members"."type" = 'GroupMember'
                                AND "members"."source_type" = 'Namespace'
                                AND "namespaces"."type" = 'Group'
                                AND "members"."user_id" = 5708766
                                AND "members"."requested_at" IS NULL
                                AND (
                                    access_level >= 10
)
)
                        UNION (
                            SELECT
                                namespaces.*
                            FROM
                                "projects"
                                INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
                                INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
                            WHERE
                                "project_authorizations"."user_id" = 5708766
)
) namespaces
                    WHERE
                        "namespaces"."type" = 'Group'
)
                    SELECT
                        "namespaces".*
                    FROM ((
                            SELECT
                                "namespaces".*
                            FROM
                                "direct_groups" "namespaces"
                            WHERE
                                "namespaces"."type" = 'Group'
)
                        UNION (
                            SELECT
                                "namespaces".*
                            FROM
                                "namespaces"
                                INNER JOIN "group_group_links" ON "group_group_links"."shared_group_id" = "namespaces"."id"
                            WHERE
                                "namespaces"."type" = 'Group'
                                AND "group_group_links"."shared_with_group_id" IN (
                                    SELECT
                                        "namespaces"."id"
                                    FROM
                                        "direct_groups" "namespaces"
                                    WHERE
                                        "namespaces"."type" = 'Group'
)
)
) namespaces
                        WHERE
                            "namespaces"."type" = 'Group'
)
                    UNION (
                        SELECT
                            "namespaces".*
                        FROM
                            "namespaces"
                            INNER JOIN "members" ON "namespaces"."id" = "members"."source_id"
                        WHERE
                            "members"."type" = 'GroupMember'
                            AND "members"."source_type" = 'Namespace'
                            AND "namespaces"."type" = 'Group'
                            AND "members"."user_id" = 5708766
                            AND "members"."access_level" = 5
                            AND (
                                EXISTS (
                                    SELECT
                                        1
                                    FROM
                                        "plans"
                                        INNER JOIN "gitlab_subscriptions" ON "gitlab_subscriptions"."hosted_plan_id" = "plans"."id"
                                    WHERE
                                        "plans"."name" IN (
                                            'silver', 'premium', 'premium_trial', 'gold', 'ultimate', 'ultimate_trial'
)
                                        AND (
                                            gitlab_subscriptions.namespace_id = namespaces.id
)
)
)
)
) namespaces
                    WHERE
                        "namespaces"."type" = 'Group'
)
                UNION (
                    SELECT
                        "namespaces".*
                    FROM
                        "namespaces",
                        "base_and_ancestors"
                    WHERE
                        "namespaces"."type" = 'Group'
                        AND "namespaces"."id" = "base_and_ancestors"."parent_id"
)
),
            "base_and_descendants" AS ((
                    SELECT
                        "namespaces".*
                    FROM
                        "namespaces"
                        INNER JOIN "members" ON "namespaces"."id" = "members"."source_id"
                    WHERE
                        "members"."type" = 'GroupMember'
                        AND "members"."source_type" = 'Namespace'
                        AND "namespaces"."type" = 'Group'
                        AND "members"."user_id" = 5708766
                        AND "members"."requested_at" IS NULL
                        AND (access_level >= 10))
                UNION (
                    SELECT
                        "namespaces".*
                    FROM
                        "namespaces",
                        "base_and_descendants"
                    WHERE
                        "namespaces"."type" = 'Group'
                        AND "namespaces"."parent_id" = "base_and_descendants"."id"))
        SELECT
            "namespaces".*
        FROM ((
                SELECT
                    "namespaces".*
                FROM
                    "base_and_ancestors" AS "namespaces"
                WHERE
                    "namespaces"."type" = 'Group')
            UNION (
                SELECT
                    "namespaces".*
                FROM
                    "base_and_descendants" AS "namespaces"
                WHERE
                    "namespaces"."type" = 'Group')) namespaces
    WHERE
        "namespaces"."type" = 'Group'
        AND "namespaces"."id" IN (
            SELECT
                "routes"."source_id"
            FROM
                "routes"
            WHERE
                "routes"."source_type" = 'Namespace'
                AND ("routes"."path" ILIKE '%gitlab%'
                    OR "routes"."name" ILIKE '%gitlab%'))
    ORDER BY
        "namespaces"."name" ASC,
        "namespaces"."id" ASC
    LIMIT 20 OFFSET 0

After Plan: https://explain.depesz.com/s/XTZR

After SQL
WITH RECURSIVE "base_and_ancestors" AS ((
        SELECT
            "namespaces".*
        FROM (( WITH "direct_groups" AS MATERIALIZED (
                    SELECT
                        "namespaces".*
                    FROM ((
                            SELECT
                                "namespaces".*
                            FROM
                                "namespaces"
                                INNER JOIN "members" ON "namespaces"."id" = "members"."source_id"
                            WHERE
                                "members"."type" = 'GroupMember'
                                AND "members"."source_type" = 'Namespace'
                                AND "namespaces"."type" = 'Group'
                                AND "members"."user_id" = 5708766
                                AND "members"."requested_at" IS NULL
                                AND (
                                    access_level >= 10
)
)
                        UNION (
                            SELECT
                                namespaces.*
                            FROM
                                "projects"
                                INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
                                INNER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id"
                            WHERE
                                "project_authorizations"."user_id" = 5708766
)
) namespaces
                    WHERE
                        "namespaces"."type" = 'Group'
)
                    SELECT
                        "namespaces".*
                    FROM ((
                            SELECT
                                "namespaces".*
                            FROM
                                "direct_groups" "namespaces"
                            WHERE
                                "namespaces"."type" = 'Group'
)
                        UNION (
                            SELECT
                                "namespaces".*
                            FROM
                                "namespaces"
                                INNER JOIN "group_group_links" ON "group_group_links"."shared_group_id" = "namespaces"."id"
                            WHERE
                                "namespaces"."type" = 'Group'
                                AND "group_group_links"."shared_with_group_id" IN (
                                    SELECT
                                        "namespaces"."id"
                                    FROM
                                        "direct_groups" "namespaces"
                                    WHERE
                                        "namespaces"."type" = 'Group'
)
)
) namespaces
                        WHERE
                            "namespaces"."type" = 'Group'
)
                    UNION (
                        SELECT
                            "namespaces".*
                        FROM
                            "namespaces"
                            INNER JOIN "members" ON "namespaces"."id" = "members"."source_id"
                        WHERE
                            "members"."type" = 'GroupMember'
                            AND "members"."source_type" = 'Namespace'
                            AND "namespaces"."type" = 'Group'
                            AND "members"."user_id" = 5708766
                            AND "members"."access_level" = 5
                            AND (
                                EXISTS (
                                    SELECT
                                        1
                                    FROM
                                        "plans"
                                        INNER JOIN "gitlab_subscriptions" ON "gitlab_subscriptions"."hosted_plan_id" = "plans"."id"
                                    WHERE
                                        "plans"."name" IN (
                                            'silver', 'premium', 'premium_trial', 'gold', 'ultimate', 'ultimate_trial'
)
                                        AND (
                                            gitlab_subscriptions.namespace_id = namespaces.id
)
)
)
)
) namespaces
                    WHERE
                        "namespaces"."type" = 'Group'
)
                UNION (
                    SELECT
                        "namespaces".*
                    FROM
                        "namespaces",
                        "base_and_ancestors"
                    WHERE
                        "namespaces"."type" = 'Group'
                        AND "namespaces"."id" = "base_and_ancestors"."parent_id"
)
),
            "base_and_descendants" AS ((
                    SELECT
                        "namespaces".*
                    FROM
                        "namespaces"
                        INNER JOIN "members" ON "namespaces"."id" = "members"."source_id"
                    WHERE
                        "members"."type" = 'GroupMember'
                        AND "members"."source_type" = 'Namespace'
                        AND "namespaces"."type" = 'Group'
                        AND "members"."user_id" = 5708766
                        AND "members"."requested_at" IS NULL
                        AND (access_level >= 10))
                UNION (
                    SELECT
                        "namespaces".*
                    FROM
                        "namespaces",
                        "base_and_descendants"
                    WHERE
                        "namespaces"."type" = 'Group'
                        AND "namespaces"."parent_id" = "base_and_descendants"."id"))
        SELECT
            "namespaces".*
        FROM ((
                SELECT
                    "namespaces".*
                FROM
                    "base_and_ancestors" AS "namespaces"
                WHERE
                    "namespaces"."type" = 'Group')
            UNION (
                SELECT
                    "namespaces".*
                FROM
                    "base_and_descendants" AS "namespaces"
                WHERE
                    "namespaces"."type" = 'Group')) namespaces
    WHERE
        "namespaces"."type" = 'Group'
        AND "namespaces"."id" IN (
            SELECT
                "routes"."source_id"
            FROM
                "routes"
            WHERE
                "routes"."source_type" = 'Namespace'
                AND ("routes"."path" ILIKE '%gitlab%'
                    OR "routes"."name" ILIKE '%gitlab%'))
    ORDER BY
        ROUND(CAST((( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("namespaces"."path", ''), 'gitlab') * CAST('1' AS numeric)) + ( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("namespaces"."name", ''), 'gitlab') * CAST('0.7' AS numeric))) AS numeric), 2) DESC,
        "namespaces"."parent_id" DESC NULLS LAST,
        "namespaces"."id" DESC
LIMIT 20 OFFSET 0

How to test

  • add a main group, example: gitlab.org
  • add subgroups and sub-subgroups under that group. Example:
image
  • name some of them gitlab, some of them other things
  • hit the API endpoint and search for gitlab: GET {{ host }}/api/v4/groups.json?order_by=similarity&search=gitlab
  • verify that the GitLab.org top level project comes back first
  • hit the API endpoint and search for gitlab.org: GET {{ host }}/api/v4/groups.json?order_by=similarity&search=gitlab
  • verify that the GitLab.org top level project comes back first
  • hit the API endpoint and do not send the search param: GET {{ host }}/api/v4/groups.json?order_by=similarity
  • verify that the results are sorted by name

Screenshots (strongly suggested)

Before (note taken from my local gdk with test groups setup, NOT GitLab.com)

I can't show the UI after because the frontend work will be done in another MR, but this is the output from the same API endpoint from the search UI drop down. If you search for gitlab, the main GitLab.org namespace does not show in the first 4 items.

image

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Terri Chu

Merge request reports