Skip to content
Snippets Groups Projects

Improve performance of the global search for issuables

All threads resolved!

What does this MR do?

In this MR we address a performance issue related to global searches. Closes #59522 (closed).

In https://gitlab.com/gitlab-org/gitlab-ce/commit/577812948dd25129e363862cfcb6d9d21d168cc2 we introduced several checks to ensure that the user had enough access level to access the issuable object. Nevertheless, those checks introduced some additional and heavy queries. The current query and the original one are:

-- Current query in master
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 64248
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 64248
                     AND issue_id = issues.id)
                OR issues.project_id IN
                  (SELECT "projects"."id"
                   FROM "projects"
                   INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
                   WHERE "project_authorizations"."user_id" = 64248
                     AND (project_authorizations.access_level >= 20)))))
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 64248
            AND (project_authorizations.project_id = projects.id))
       OR projects.visibility_level IN (20))
  AND ((projects.visibility_level > 0
        AND ("project_features"."issues_access_level" IS NULL
             OR "project_features"."issues_access_level" >= (20)
             OR ("project_features"."issues_access_level" = 10
                 AND EXISTS
                   (SELECT 1
                    FROM "project_authorizations"
                    WHERE "project_authorizations"."user_id" = 64248
                      AND (project_authorizations.project_id = projects.id)
                      AND (project_authorizations.access_level >= 10)))))
       OR (projects.visibility_level = 0
           AND ("project_features"."issues_access_level" IS NULL
                OR "project_features"."issues_access_level" >= 10)
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 64248
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC

Query plan: https://explain.depesz.com/s/mwfn

-- Original query before min_access_level change
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 64248
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 64248
                     AND issue_id = issues.id)
                OR issues.project_id IN
                  (SELECT "projects"."id"
                   FROM "projects"
                   INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
                   WHERE "project_authorizations"."user_id" = 64248
                     AND (project_authorizations.access_level >= 20)))))
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 64248
            AND (project_authorizations.project_id = projects.id))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IN (NULL, 20, 30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 64248
                AND (project_authorizations.project_id = projects.id))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC

Query plan: https://explain.depesz.com/s/eD8Z

Compared to the original one, we introduce an additional lookup in the project_authorizations table plus some other lookups.

In the IssuableFinder, when we retrieve the projects to use for the search of issuable objects, we retrieve projects where the user has access and public projects. Then, we check which of those projects the user has enough access level.

The idea behind this MR is to retrieve only projects which the user has enough min access level and also the public ones. This way, we can remove the latter checks, and have the same behavior. The current query is:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 64248
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 64248
                     AND issue_id = issues.id)
                OR EXISTS
                  (SELECT 1
                   FROM "project_authorizations"
                   WHERE "project_authorizations"."user_id" = 64248
                     AND (project_authorizations.project_id = issues.project_id)
                     AND (project_authorizations.access_level >= 20)))))
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 64248
            AND (project_authorizations.project_id = projects.id)
            AND (project_authorizations.access_level >= 10))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IS NULL
       OR "project_features"."issues_access_level" IN (20,
                                                       30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 64248
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC

Query plan: https://explain.depesz.com/s/kio

This query is very similar to the original one, with only one more query, to check the access level in the project. We also get rid of the IN we used for getting authorizations when the issue is confidential.

In https://gitlab.com/gitlab-org/gitlab-ce/commit/10ceb33ba271f603fa09d4a4b5fdca03fd7ea333 we also introduce a CTE in the IssuableFinder, trying to improve the performance when we were searching for issues from a group namespace. We tried to use the same approach but it didn't go well in this context: https://explain.depesz.com/s/591N.

The longer part of the query is the bitmap scan using trigram indexes (and that the scan is retrieved from the heap). We have to change from the database perspective how to deal with these searches or, at least, reduce the number of projects to process (like avoiding getting public projects by default).

The MR also removes the ORDER BY in all the count queries which will help when used with LIMIT. When we sort records, we can't limit the records until the sorting is done.

Global search with project scope

The current global search queries when we set a project or a group can be improved. At the moment, when we have a project scope, we perform the query over all available projects, check authorizations, ..., and finally, add a where with the project id.

Passing the project id first to the IssuableFinder we can get a query like:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
WHERE (issues.confidential IS NOT TRUE
       OR (issues.confidential = TRUE
           AND (issues.author_id = 955795
                OR EXISTS
                  (SELECT TRUE
                   FROM issue_assignees
                   WHERE user_id = 955795
                     AND issue_id = issues.id)
                OR EXISTS
                  (SELECT 1
                   FROM "project_authorizations"
                   WHERE "project_authorizations"."user_id" = 955795
                     AND (project_authorizations.project_id = projects.id)
                     AND (project_authorizations.access_level >= 20)))))
  AND "issues"."project_id" = 13083
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC
LIMIT 20
OFFSET 0

The query plan is https://explain.depesz.com/s/VL2G. Compared to the original one, we get a small improvement: 108ms vs 89ms in execution time.

Global search with group scope

Applying the same logic as above. They query we get is:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 955795
            AND (project_authorizations.project_id = projects.id)
            AND (project_authorizations.access_level >= 10))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IS NULL
       OR "project_features"."issues_access_level" IN (20,
                                                       30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 955795
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
  AND "issues"."project_id" IN
    (SELECT "projects"."id"
     FROM "projects"
     INNER JOIN routes rs ON rs.source_id = projects.id
     AND rs.source_type = 'Project'
     WHERE (EXISTS
              (SELECT 1
               FROM "project_authorizations"
               WHERE "project_authorizations"."user_id" = 955795
                 AND (project_authorizations.project_id = projects.id))
            OR projects.visibility_level IN (20))
       AND (rs.path LIKE 'gitlab-org/%'))
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC
LIMIT 20
OFFSET 0

This query sets the project namespace id, and compared to the original one it seems a good improvement. The query plan for this query is https://explain.depesz.com/s/nhbF, while the original one is https://explain.depesz.com/s/ah3d. The improvement is 6.5s vs 2.5s.

But we can improve this query even more. For that, we need to get rid of the default_project_file in Gitlab::SearchResults. Avoiding adding the last where in https://gitlab.com/gitlab-org/gitlab-ce/blob/fj-59522-improve-search-controller-performance/lib/gitlab/search_results.rb#L98, we get the query:

SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
  AND (EXISTS
         (SELECT 1
          FROM "project_authorizations"
          WHERE "project_authorizations"."user_id" = 955795
            AND (project_authorizations.project_id = projects.id)
            AND (project_authorizations.access_level >= 10))
       OR projects.visibility_level IN (20))
  AND ("project_features"."issues_access_level" IS NULL
       OR "project_features"."issues_access_level" IN (20,
                                                       30)
       OR ("project_features"."issues_access_level" = 10
           AND EXISTS
             (SELECT 1
              FROM "project_authorizations"
              WHERE "project_authorizations"."user_id" = 955795
                AND (project_authorizations.project_id = projects.id)
                AND (project_authorizations.access_level >= 10))))
  AND ("issues"."title" ILIKE '%amex%'
       OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
         "issues"."id" DESC
LIMIT 20
OFFSET 0

For this change, we need to rethink if we want to allow limiting projects after the finder has done its work or if we can move this logic right inside the finder. We have also to refactor how we retrieve the milestones in the global search, because the required changed would affect this process.

I'll open a follow-up issue for this.

Does this MR meet the acceptance criteria?

Conformity

Performance and testing

With these changes, we stay very close to the original query but having the behavior. Unfortunately, the performance hasn't improved very much. The original sorting plan is https://explain.depesz.com/s/QUD5 and the new one is https://explain.depesz.com/s/QqCS.

Edited by Francisco Javier López

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    • 6f4b4b51 - Add improvements to the global search process

    Compare with previous version

  • Francisco Javier López changed the description

    changed the description

  • added 1 commit

    • dc7a130d - Add improvements to the global search process

    Compare with previous version

  • Francisco Javier López changed the description

    changed the description

  • Bob Van Landuyt
  • added 1 commit

    • 9cedf78f - Code review comments applied

    Compare with previous version

  • added 1 commit

    • 2e7bbbbb - Add improvements to the global search process

    Compare with previous version

  • Thanks for the explanations @fjsanpedro replied, to some :smile:.

  • added 1 commit

    • 770c441f - Added scope to project and group global searches

    Compare with previous version

  • Francisco Javier López changed the description

    changed the description

  • Francisco Javier López
  • added 1 commit

    • 64d3ba07 - More code review comments addressed

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Francisco Javier López changed the description

    changed the description

  • Bob Van Landuyt resolved all discussions

    resolved all discussions

  • Bob Van Landuyt
  • Bob Van Landuyt
  • Thanks @fjsanpedro just a question left.

    Should we update the queries and plans in the description before we move this forward? Does this need separate DB review? Or can it go straight to a maintainer?

  • assigned to @fjsanpedro and unassigned @reprazent

  • @abrandl we have been discussing this MR the last few days. Do you mind to take a closer look?

  • Bob Van Landuyt resolved all discussions

    resolved all discussions

  • Francisco Javier López resolved all discussions

    resolved all discussions

  • Francisco Javier López changed the description

    changed the description

  • Bob Van Landuyt approved this merge request

    approved this merge request

  • @jameslopez do you mind to take a look?

  • assigned to @jameslopez and unassigned @fjsanpedro

  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • @fjsanpedro pending DB review, it looks good! Left some minor suggestions.

  • added 1 commit

    • f1fff08b - Code review comments applied

    Compare with previous version

  • added 1 commit

    • 38919863 - Add improvements to the global search process

    Compare with previous version

  • Francisco Javier López resolved all discussions

    resolved all discussions

  • merged

  • James Lopez mentioned in commit a60cba58

    mentioned in commit a60cba58

  • Thanks Fran!

  • Andreas Brandl
  • Andreas Brandl changed the description

    changed the description

  • Francisco Javier López changed the description

    changed the description

  • Andreas Brandl resolved all discussions

    resolved all discussions

  • @fjsanpedro Great! Thank you for providing detailed context in the description! :thumbsup: :100:

    The queries from the description and their respective changes are good IMHO and they even manage to reduce complexity for some queries while improving performance. :thumbsup:

  • added typebug label

  • Please register or sign in to reply
    Loading