Improve performance of the global search for issuables
What does this MR do?
In this MR we address a performance issue related to global searches. Closes #59522 (closed).
In https://gitlab.com/gitlab-org/gitlab-ce/commit/577812948dd25129e363862cfcb6d9d21d168cc2 we introduced several checks to ensure that the user had enough access level to access the issuable object. Nevertheless, those checks introduced some additional and heavy queries. The current query and the original one are:
-- Current query in master
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 64248
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 64248
AND issue_id = issues.id)
OR issues.project_id IN
(SELECT "projects"."id"
FROM "projects"
INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.access_level >= 20)))))
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (20))
AND ((projects.visibility_level > 0
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" >= (20)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10)))))
OR (projects.visibility_level = 0
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" >= 10)
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC
Query plan: https://explain.depesz.com/s/mwfn
-- Original query before min_access_level change
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 64248
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 64248
AND issue_id = issues.id)
OR issues.project_id IN
(SELECT "projects"."id"
FROM "projects"
INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.access_level >= 20)))))
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IN (NULL, 20, 30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC
Query plan: https://explain.depesz.com/s/eD8Z
Compared to the original one, we introduce an additional lookup in the project_authorizations
table plus some other lookups.
In the IssuableFinder
, when we retrieve the projects to use for the search of issuable
objects, we retrieve projects where the user has access and public projects. Then, we check which of those projects the user has enough access level.
The idea behind this MR is to retrieve only projects which the user has enough min access level and also the public ones. This way, we can remove the latter checks, and have the same behavior. The current query is:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 64248
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 64248
AND issue_id = issues.id)
OR EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = issues.project_id)
AND (project_authorizations.access_level >= 20)))))
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20,
30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
Query plan: https://explain.depesz.com/s/kio
This query is very similar to the original one, with only one more query, to check the access level in the project. We also get rid of the IN
we used for getting authorizations when the issue is confidential.
In https://gitlab.com/gitlab-org/gitlab-ce/commit/10ceb33ba271f603fa09d4a4b5fdca03fd7ea333 we also introduce
a CTE in the IssuableFinder
, trying to improve the performance when we were searching for issues from a group namespace. We tried to use the same approach but it didn't go well in this context: https://explain.depesz.com/s/591N.
The longer part of the query is the bitmap scan using trigram
indexes (and that the scan is retrieved from the heap). We have to change from the database perspective how to deal with these searches or, at least, reduce the number of projects to process (like avoiding getting public projects by default).
The MR also removes the ORDER BY
in all the count queries which will help when used with LIMIT
. When we sort records, we can't limit the records until the sorting is done.
Global search with project scope
The current global search queries when we set a project or a group can be improved. At the moment, when we have a project scope, we perform the query over all available projects, check authorizations, ..., and finally, add a where
with the project id.
Passing the project id first to the IssuableFinder
we can get a query like:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 955795
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 955795
AND issue_id = issues.id)
OR EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 20)))))
AND "issues"."project_id" = 13083
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
LIMIT 20
OFFSET 0
The query plan is https://explain.depesz.com/s/VL2G. Compared to the original one, we get a small improvement: 108ms vs 89ms in execution time.
Global search with group scope
Applying the same logic as above. They query we get is:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20,
30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
AND "issues"."project_id" IN
(SELECT "projects"."id"
FROM "projects"
INNER JOIN routes rs ON rs.source_id = projects.id
AND rs.source_type = 'Project'
WHERE (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (20))
AND (rs.path LIKE 'gitlab-org/%'))
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
LIMIT 20
OFFSET 0
This query sets the project namespace id, and compared to the original one it seems a good improvement. The query plan for this query is https://explain.depesz.com/s/nhbF, while the original one is https://explain.depesz.com/s/ah3d. The improvement is 6.5s vs 2.5s.
But we can improve this query even more. For that, we need to get rid of the default_project_file
in Gitlab::SearchResults
. Avoiding adding the last where
in https://gitlab.com/gitlab-org/gitlab-ce/blob/fj-59522-improve-search-controller-performance/lib/gitlab/search_results.rb#L98, we get the query:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20,
30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
LIMIT 20
OFFSET 0
For this change, we need to rethink if we want to allow limiting projects after the finder has done its work or if we can move this logic right inside the finder. We have also to refactor how we retrieve the milestones in the global search, because the required changed would affect this process.
I'll open a follow-up issue for this.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation created/updated or follow-up review issue created -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Performance and testing
With these changes, we stay very close to the original query but having the behavior. Unfortunately, the performance hasn't improved very much. The original sorting plan is https://explain.depesz.com/s/QUD5 and the new one is https://explain.depesz.com/s/QqCS.
Merge request reports
Activity
changed milestone to %11.11
Reviewer roulette
Changes that require review have been detected! A merge request is normally reviewed by both a reviewer and a maintainer in its primary category (e.g. frontend or backend), and by a maintainer in all other categories.
To spread load more evenly across eligible reviewers, Danger has randomly picked a candidate for each review slot. Feel free to override this selection if you think someone else would be better-suited, or the chosen person is unavailable.
Once you've decided who will review this merge request, mention them as you normally would! Danger does not (yet?) automatically notify them for you.
Category Reviewer Maintainer backend Jarka Košanová ( @jarka
)Michael Kozono ( @mkozono
)Generated by
DangerEdited by 🤖 GitLab Bot 🤖added 2 commits
added 2 commits
added 1 commit
- 6440b7c2 - Refactored a couple of things and added spec
added 1 commit
- 3ce321a1 - Add improvements to the global search process
added 1 commit
- e34d161c - Small refactor and added spec for attemp_global_search_optimizations
marked the checklist item Changelog entry as completed
@reprazent since you work on the original issue, do you mind to take a look? Sorry to flood you with all my MRs
assigned to @reprazent
- Resolved by Francisco Javier López
added 1 commit
- 6f4b4b51 - Add improvements to the global search process
added 1 commit
- dc7a130d - Add improvements to the global search process
- Resolved by Francisco Javier López
@fjsanpedro I noticed in the description we're still doing a
WHERE project_id IN (SELECT projects.id INNER JOIN project_authorizations....)
for confidential issues. Should we also replace this with aEXISTS ( SELECT 1 FROM project_authorizations... )
like we have in the other parts of the query?That part lives in the
IssuesFinder
Edited by Bob Van Landuyt
- Resolved by Francisco Javier López
- Resolved by Bob Van Landuyt
- Resolved by Francisco Javier López
- Resolved by Bob Van Landuyt
- Resolved by Bob Van Landuyt
Nice work @fjsanpedro left some comments, let me know what you think.
assigned to @fjsanpedro
assigned to @reprazent
added 1 commit
- 2e7bbbbb - Add improvements to the global search process
Thanks for the explanations @fjsanpedro replied, to some
.assigned to @fjsanpedro
- Resolved by Francisco Javier López
added 1 commit
- 770c441f - Added scope to project and group global searches
- Resolved by Bob Van Landuyt
@reprazent since the last time you reviewed the MR, I added some really small changes: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27817/pipelines#global-search-with-group-scope(https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27817/diffs#aded78be9814c24233a29f0bcc3c06c4fe0ee78a_148_149) and https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27817#global-search-with-group-scope(https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27817/diffs#186e2e6784adba7b472cad09062b6d744ddcf264_29_32).
assigned to @reprazent
- Resolved by Bob Van Landuyt
- Resolved by Bob Van Landuyt
Thanks @fjsanpedro just a question left.
Should we update the queries and plans in the description before we move this forward? Does this need separate DB review? Or can it go straight to a maintainer?
assigned to @fjsanpedro and unassigned @reprazent
@abrandl we have been discussing this MR the last few days. Do you mind to take a closer look?
@jameslopez do you mind to take a look?
assigned to @jameslopez and unassigned @fjsanpedro
- Resolved by Francisco Javier López
- Resolved by Francisco Javier López
- Resolved by Francisco Javier López
- Resolved by Francisco Javier López
- Resolved by Francisco Javier López
- Resolved by Francisco Javier López
@fjsanpedro pending DB review, it looks good! Left some minor suggestions.
assigned to @fjsanpedro
unassigned @jameslopez
added 1 commit
- 38919863 - Add improvements to the global search process
mentioned in commit a60cba58
- Resolved by Andreas Brandl
@fjsanpedro Great! Thank you for providing detailed context in the description!
The queries from the description and their respective changes are good IMHO and they even manage to reduce complexity for some queries while improving performance.
Thanks @abrandl!
mentioned in issue gitlab-org/release/tasks#780 (closed)
added typebug label