Skip to content

Improve performance of the global search for issuables

What does this MR do?

In this MR we address a performance issue related to global searches. Closes #59522 (closed).

In https://gitlab.com/gitlab-org/gitlab-ce/commit/577812948dd25129e363862cfcb6d9d21d168cc2 we introduced several checks to ensure that the user had enough access level to access the issuable object. Nevertheless, those checks introduced some additional and heavy queries. The current query and the original one are:

-- Current query in master
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 64248
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 64248
                     AND issue_id = issues.id)
                OR issues.project_id IN
                  (SELECT "projects"."id"
                   FROM "projects"
                   INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
                   WHERE "project_authorizations"."user_id" = 64248
                     AND (project_authorizations.access_level >= 20)))))
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 64248
            AND (project_authorizations.project_id = projects.id))
       OR projects.visibility_level IN (20))
  AND ((projects.visibility_level > 0
        AND ("project_features"."issues_access_level" IS NULL
             OR "project_features"."issues_access_level" >= (20)
             OR ("project_features"."issues_access_level" = 10
                 AND EXISTS
                   (SELECT 1
                    FROM "project_authorizations"
                    WHERE "project_authorizations"."user_id" = 64248
                      AND (project_authorizations.project_id = projects.id)
                      AND (project_authorizations.access_level >= 10)))))
       OR (projects.visibility_level = 0
           AND ("project_features"."issues_access_level" IS NULL
                OR "project_features"."issues_access_level" >= 10)
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 64248
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC

Query plan: https://explain.depesz.com/s/mwfn

-- Original query before min_access_level change
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 64248
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 64248
                     AND issue_id = issues.id)
                OR issues.project_id IN
                  (SELECT "projects"."id"
                   FROM "projects"
                   INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
                   WHERE "project_authorizations"."user_id" = 64248
                     AND (project_authorizations.access_level >= 20)))))
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 64248
            AND (project_authorizations.project_id = projects.id))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IN (NULL, 20, 30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 64248
                AND (project_authorizations.project_id = projects.id))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC

Query plan: https://explain.depesz.com/s/eD8Z

Compared to the original one, we introduce an additional lookup in the project_authorizations table plus some other lookups.

In the IssuableFinder, when we retrieve the projects to use for the search of issuable objects, we retrieve projects where the user has access and public projects. Then, we check which of those projects the user has enough access level.

The idea behind this MR is to retrieve only projects which the user has enough min access level and also the public ones. This way, we can remove the latter checks, and have the same behavior. The current query is:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 64248
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 64248
                     AND issue_id = issues.id)
                OR EXISTS
                  (SELECT 1
                   FROM "project_authorizations"
                   WHERE "project_authorizations"."user_id" = 64248
                     AND (project_authorizations.project_id = issues.project_id)
                     AND (project_authorizations.access_level >= 20)))))
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 64248
            AND (project_authorizations.project_id = projects.id)
            AND (project_authorizations.access_level >= 10))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IS NULL
       OR "project_features"."issues_access_level" IN (20,
                                                       30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 64248
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC

Query plan: https://explain.depesz.com/s/kio

This query is very similar to the original one, with only one more query, to check the access level in the project. We also get rid of the IN we used for getting authorizations when the issue is confidential.

In https://gitlab.com/gitlab-org/gitlab-ce/commit/10ceb33ba271f603fa09d4a4b5fdca03fd7ea333 we also introduce a CTE in the IssuableFinder, trying to improve the performance when we were searching for issues from a group namespace. We tried to use the same approach but it didn't go well in this context: https://explain.depesz.com/s/591N.

The longer part of the query is the bitmap scan using trigram indexes (and that the scan is retrieved from the heap). We have to change from the database perspective how to deal with these searches or, at least, reduce the number of projects to process (like avoiding getting public projects by default).

The MR also removes the ORDER BY in all the count queries which will help when used with LIMIT. When we sort records, we can't limit the records until the sorting is done.

Global search with project scope

The current global search queries when we set a project or a group can be improved. At the moment, when we have a project scope, we perform the query over all available projects, check authorizations, ..., and finally, add a where with the project id.

Passing the project id first to the IssuableFinder we can get a query like:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 955795
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 955795
                     AND issue_id = issues.id)
                OR EXISTS
                  (SELECT 1
                   FROM "project_authorizations"
                   WHERE "project_authorizations"."user_id" = 955795
                     AND (project_authorizations.project_id = projects.id)
                     AND (project_authorizations.access_level >= 20)))))
  AND "issues"."project_id" = 13083
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC
LIMIT 20
OFFSET 0

The query plan is https://explain.depesz.com/s/VL2G. Compared to the original one, we get a small improvement: 108ms vs 89ms in execution time.

Global search with group scope

Applying the same logic as above. They query we get is:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 955795
            AND (project_authorizations.project_id = projects.id)
            AND (project_authorizations.access_level >= 10))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IS NULL
       OR "project_features"."issues_access_level" IN (20,
                                                       30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 955795
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
  AND "issues"."project_id" IN
    (SELECT "projects"."id"
     FROM "projects"
     INNER JOIN routes rs ON rs.source_id = projects.id
     AND rs.source_type = 'Project'
     WHERE (EXISTS
              (SELECT 1
               FROM "project_authorizations"
               WHERE "project_authorizations"."user_id" = 955795
                 AND (project_authorizations.project_id = projects.id))
            OR projects.visibility_level IN (20))
       AND (rs.path LIKE 'gitlab-org/%'))
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC
LIMIT 20
OFFSET 0

This query sets the project namespace id, and compared to the original one it seems a good improvement. The query plan for this query is https://explain.depesz.com/s/nhbF, while the original one is https://explain.depesz.com/s/ah3d. The improvement is 6.5s vs 2.5s.

But we can improve this query even more. For that, we need to get rid of the default_project_file in Gitlab::SearchResults. Avoiding adding the last where in https://gitlab.com/gitlab-org/gitlab-ce/blob/fj-59522-improve-search-controller-performance/lib/gitlab/search_results.rb#L98, we get the query:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 955795
            AND (project_authorizations.project_id = projects.id)
            AND (project_authorizations.access_level >= 10))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IS NULL
       OR "project_features"."issues_access_level" IN (20,
                                                       30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 955795
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC
LIMIT 20
OFFSET 0

For this change, we need to rethink if we want to allow limiting projects after the finder has done its work or if we can move this logic right inside the finder. We have also to refactor how we retrieve the milestones in the global search, because the required changed would affect this process.

I'll open a follow-up issue for this.

Does this MR meet the acceptance criteria?

Conformity

Performance and testing

With these changes, we stay very close to the original query but having the behavior. Unfortunately, the performance hasn't improved very much. The original sorting plan is https://explain.depesz.com/s/QUD5 and the new one is https://explain.depesz.com/s/QqCS.

Merge request reports