WIP: Improve snippet search performance
What does this MR do?
Improve performance of snippets search. The original plan is https://explain.depesz.com/s/JXHr and the original query is:
Original Query
SELECT "snippets".*
FROM
(SELECT "snippets".*
FROM "snippets"
WHERE (snippets.visibility_level IN (0,
10,
20)
OR snippets.author_id = 955795)
AND "snippets"."project_id" IS NULL
UNION SELECT "snippets".*
FROM "snippets"
INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE (snippets.visibility_level IN (0,
10,
20)
OR snippets.author_id = 955795)
AND (projects.visibility_level IN (0,
10,
20))
AND "project_features"."snippets_access_level" IN (20,
30)
UNION SELECT "snippets".*
FROM "snippets"
INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE "project_features"."snippets_access_level" IN (20,
30,
10)
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE (project_id = snippets.project_id)
AND "project_authorizations"."user_id" = 1))) snippets
WHERE "snippets"."content" ILIKE '%asd%'
ORDER BY created_at DESC,
updated_at DESC
LIMIT 1000
Admin user
When the user is an admin, we include several subqueries that are not really necessary. For example:
SELECT "snippets".*
FROM "snippets"
WHERE (snippets.visibility_level IN (0,
10,
20)
OR snippets.author_id = 955795)
AND "snippets"."project_id" IS NULL
We're fetching here snippets with any visibility or if the author_id
matches the current user. If we retrieve any record no matter the visibility, we doesn't need to check for the author, because the snippet will be already fetched from the first condition.
Therefore we can reduce the query to:
SELECT "snippets".*
FROM "snippets"
WHERE "snippets"."project_id" IS NULL
The new plan is https://explain.depesz.com/s/5pMx. The execution time is not representative compared to the original plan and is almost the same because underneath we're making the same request. Nevertheless, the planning time has been reduced.
Based on the new query, we have a similar problem with project visibility:
SELECT "snippets".*
FROM "snippets"
INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE (projects.visibility_level IN (0,
10,
20))
AND "project_features"."snippets_access_level" IN (20,
30)
Here we retrieve all projects no matter the visibility_level
and the check the snippet_access_level
. We can remove the first condition since it retrieves all the projets basically:
SELECT "snippets".*
FROM "snippets"
INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE "project_features"."snippets_access_level" IN (20,
30)
The new plan is https://explain.depesz.com/s/oOh7
Refs https://gitlab.com/gitlab-org/gitlab-ee/issues/26123
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry for user-facing changes, or community contribution. Check the link for other scenarios. -
Documentation created/updated or follow-up review issue created -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content