Skip to content

Perform two-step Routable lookup by path

Andreas Brandl requested to merge ab-routable-two-step-search into master

What does this MR do?

In order to lookup a Project or Namespace by path, we prefer an exact match (case-sensitive) but in absence of that, we'd also take a case-insensitive match.

The case-insensitive matching with preference for the exact match is a bit more involved in SQL as the exact lookup. Yet, the majority of cases will be an exact match. The thinking here is that we can optimize the lookup by performing an exact match first and only if there is no result, we perform the case-insensitive lookup.

Data for GitLab.com:

  • We have about 15M records in routes table
  • About 2,500 routes exist where there's more than one record with the same lower(path)

It is possible for a user to craft requests that would always trigger the 2-step search (e.g. we have a route for /foo/bar, the request is always for /FOO/bar). In this case, the change at hand is not beneficial as it would run an additional query.

However, based on the data, it is highly likely that the vast majority of requests can be satisfied with an exact match only.

The context for this change is https://gitlab.com/gitlab-org/gitlab-ce/issues/64590#note_208156463.

Example

Before

The first example is an exact match:

[2] pry(..>)> Group.find_by_full_path('foo/group1'); nil
D, [2019-08-27T17:44:52.490869 #21675] DEBUG -- :   Group Load (3.9ms)  SELECT  "namespaces".* FROM "namespaces" INNER JOIN "routes" ON "routes"."source_id" = "namespaces"."id" AND "routes"."source_type" = $1 WHERE "namespaces"."type" IN ('Group') AND ((LOWER(routes.path) = LOWER('foo/group1'))) ORDER BY (CASE WHEN routes.path = 'foo/group1' THEN 0 ELSE 1 END) LIMIT $2  [["source_type", "Namespace"], ["LIMIT", 1]]
D, [2019-08-27T17:44:52.491291 #21675] DEBUG -- :   ↳ app/models/concerns/routable.rb:37
=> nil

Second example is without an exact match, but still retrieving the same group (same query as above):

[3] pry(..)> Group.find_by_full_path('foo/GROUP1'); nil
D, [2019-08-27T17:44:55.918519 #21675] DEBUG -- :   Group Load (2.6ms)  SELECT  "namespaces".* FROM "namespaces" INNER JOIN "routes" ON "routes"."source_id" = "namespaces"."id" AND "routes"."source_type" = $1 WHERE "namespaces"."type" IN ('Group') AND ((LOWER(routes.path) = LOWER('foo/GROUP1'))) ORDER BY (CASE WHEN routes.path = 'foo/GROUP1' THEN 0 ELSE 1 END) LIMIT $2  [["source_type", "Namespace"], ["LIMIT", 1]]
D, [2019-08-27T17:44:55.918855 #21675] DEBUG -- :   ↳ app/models/concerns/routable.rb:37
=> nil

After

The first example is an exact match:

[2] pry(..)> Group.find_by_full_path('foo/group1'); nil
D, [2019-08-27T17:41:42.331667 #20175] DEBUG -- :   Group Load (2.3ms)  SELECT  "namespaces".* FROM "namespaces" INNER JOIN "routes" ON "routes"."source_id" = "namespaces"."id" AND "routes"."source_type" = $1 WHERE "namespaces"."type" IN ('Group') AND "routes"."path" = $2 LIMIT $3
 [["source_type", "Namespace"], ["path", "foo/group1"], ["LIMIT", 1]]
D, [2019-08-27T17:41:42.332134 #20175] DEBUG -- :   ↳ app/models/concerns/routable.rb:37
=> nil

Second example with no exact match and a second query to perform case-insensitive search:

[3] pry(..)> Group.find_by_full_path('foo/GROUP1'); nil
D, [2019-08-27T17:41:50.825537 #20175] DEBUG -- :   Group Load (2.3ms)  SELECT  "namespaces".* FROM "namespaces" INNER JOIN "routes" ON "routes"."source_id" = "namespaces"."id" AND "routes"."source_type" = $1 WHERE "namespaces"."type" IN ('Group') AND "routes"."path" = $2 LIMIT $3
 [["source_type", "Namespace"], ["path", "foo/GROUP1"], ["LIMIT", 1]]
D, [2019-08-27T17:41:50.825894 #20175] DEBUG -- :   ↳ app/models/concerns/routable.rb:37
D, [2019-08-27T17:41:50.831699 #20175] DEBUG -- :   Group Load (2.3ms)  SELECT  "namespaces".* FROM "namespaces" INNER JOIN "routes" ON "routes"."source_id" = "namespaces"."id" AND "routes"."source_type" = $1 WHERE "namespaces"."type" IN ('Group') AND ((LOWER(routes.path) = LOWER('foo/GROUP1'))) LIMIT $2  [["source_type", "Namespace"], ["LIMIT", 1]]
D, [2019-08-27T17:41:50.832307 #20175] DEBUG -- :   ↳ app/models/concerns/routable.rb:41
=> nil

Query performance

Before

Plan: https://explain.depesz.com/s/mB3W

SELECT
    "projects".*
FROM
    "projects"
    INNER JOIN "routes" ON "routes"."source_id" = "projects"."id"
        AND "routes"."source_type" = 'Project'
WHERE ((LOWER(routes.path) = LOWER('gitlab-org/gitlab-ce')))
ORDER BY
    (
        CASE WHEN routes.path = 'gitlab-org/gitlab-ce' THEN
            0
        ELSE
            1
        END)
LIMIT 1;

After

Plan: https://explain.depesz.com/s/D90w

SELECT
    "projects".*
FROM
    "projects"
    INNER JOIN "routes" ON "routes"."source_id" = "projects"."id"
        AND "routes"."source_type" = 'Project'
WHERE
    "routes"."path" = 'gitlab-org/gitlab-ce'
LIMIT 1;

This query is a bit simpler and takes a little less time to plan and execute (see the plans for details). Note though we're talking <1ms here in both cases.

Edited by Andreas Brandl

Merge request reports