Improve performance of the global search for issuables
What does this MR do?
In this MR we address a performance issue related to global searches. Closes #59522 (closed).
In https://gitlab.com/gitlab-org/gitlab-ce/commit/577812948dd25129e363862cfcb6d9d21d168cc2 we introduced several checks to ensure that the user had enough access level to access the issuable object. Nevertheless, those checks introduced some additional and heavy queries. The current query and the original one are:
-- Current query in master
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 64248
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 64248
AND issue_id = issues.id)
OR issues.project_id IN
(SELECT "projects"."id"
FROM "projects"
INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.access_level >= 20)))))
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (20))
AND ((projects.visibility_level > 0
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" >= (20)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10)))))
OR (projects.visibility_level = 0
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" >= 10)
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC
Query plan: https://explain.depesz.com/s/mwfn
-- Original query before min_access_level change
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 64248
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 64248
AND issue_id = issues.id)
OR issues.project_id IN
(SELECT "projects"."id"
FROM "projects"
INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.access_level >= 20)))))
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IN (NULL, 20, 30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY updated_at DESC
Query plan: https://explain.depesz.com/s/eD8Z
Compared to the original one, we introduce an additional lookup in the project_authorizations
table plus some other lookups.
In the IssuableFinder
, when we retrieve the projects to use for the search of issuable
objects, we retrieve projects where the user has access and public projects. Then, we check which of those projects the user has enough access level.
The idea behind this MR is to retrieve only projects which the user has enough min access level and also the public ones. This way, we can remove the latter checks, and have the same behavior. The current query is:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 64248
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 64248
AND issue_id = issues.id)
OR EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = issues.project_id)
AND (project_authorizations.access_level >= 20)))))
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20,
30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 64248
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
Query plan: https://explain.depesz.com/s/kio
This query is very similar to the original one, with only one more query, to check the access level in the project. We also get rid of the IN
we used for getting authorizations when the issue is confidential.
In https://gitlab.com/gitlab-org/gitlab-ce/commit/10ceb33ba271f603fa09d4a4b5fdca03fd7ea333 we also introduce
a CTE in the IssuableFinder
, trying to improve the performance when we were searching for issues from a group namespace. We tried to use the same approach but it didn't go well in this context: https://explain.depesz.com/s/591N.
The longer part of the query is the bitmap scan using trigram
indexes (and that the scan is retrieved from the heap). We have to change from the database perspective how to deal with these searches or, at least, reduce the number of projects to process (like avoiding getting public projects by default).
The MR also removes the ORDER BY
in all the count queries which will help when used with LIMIT
. When we sort records, we can't limit the records until the sorting is done.
Global search with project scope
The current global search queries when we set a project or a group can be improved. At the moment, when we have a project scope, we perform the query over all available projects, check authorizations, ..., and finally, add a where
with the project id.
Passing the project id first to the IssuableFinder
we can get a query like:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
WHERE (issues.confidential IS NOT TRUE
OR (issues.confidential = TRUE
AND (issues.author_id = 955795
OR EXISTS
(SELECT TRUE
FROM issue_assignees
WHERE user_id = 955795
AND issue_id = issues.id)
OR EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 20)))))
AND "issues"."project_id" = 13083
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
LIMIT 20
OFFSET 0
The query plan is https://explain.depesz.com/s/VL2G. Compared to the original one, we get a small improvement: 108ms vs 89ms in execution time.
Global search with group scope
Applying the same logic as above. They query we get is:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20,
30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
AND "issues"."project_id" IN
(SELECT "projects"."id"
FROM "projects"
INNER JOIN routes rs ON rs.source_id = projects.id
AND rs.source_type = 'Project'
WHERE (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (20))
AND (rs.path LIKE 'gitlab-org/%'))
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
LIMIT 20
OFFSET 0
This query sets the project namespace id, and compared to the original one it seems a good improvement. The query plan for this query is https://explain.depesz.com/s/nhbF, while the original one is https://explain.depesz.com/s/ah3d. The improvement is 6.5s vs 2.5s.
But we can improve this query even more. For that, we need to get rid of the default_project_file
in Gitlab::SearchResults
. Avoiding adding the last where
in https://gitlab.com/gitlab-org/gitlab-ce/blob/fj-59522-improve-search-controller-performance/lib/gitlab/search_results.rb#L98, we get the query:
SELECT "issues".*
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" = 9970
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))
OR projects.visibility_level IN (20))
AND ("project_features"."issues_access_level" IS NULL
OR "project_features"."issues_access_level" IN (20,
30)
OR ("project_features"."issues_access_level" = 10
AND EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 955795
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10))))
AND ("issues"."title" ILIKE '%amex%'
OR "issues"."description" ILIKE '%amex%')
ORDER BY "issues"."updated_at" DESC,
"issues"."id" DESC
LIMIT 20
OFFSET 0
For this change, we need to rethink if we want to allow limiting projects after the finder has done its work or if we can move this logic right inside the finder. We have also to refactor how we retrieve the milestones in the global search, because the required changed would affect this process.
I'll open a follow-up issue for this.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation created/updated or follow-up review issue created -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Performance and testing
With these changes, we stay very close to the original query but having the behavior. Unfortunately, the performance hasn't improved very much. The original sorting plan is https://explain.depesz.com/s/QUD5 and the new one is https://explain.depesz.com/s/QqCS.