Negated filtering with non-existing label produces very slow query plans
When issues are filtered with a non-existing label, a very slow query plan is produced.
It's best to start with an example to illustrate the problem.
Suppose we want to filter for issues that don't have idontexist
label and this idontexist
label does NOT exist within gitlab-org
group.
A corresponding issues list url with filters would be:
https://gitlab.com/groups/gitlab-org/-/issues?label_name%5B%5D=section%3A%3Adev&label_name%5B%5D=security¬%5Blabel_name%5D%5B%5D=idontexist
Visiting the above url may produce a query like:
SELECT “issues”.*
FROM “issues”
INNER JOIN “projects” ON “projects”.“id” = “issues”.“project_id”
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (
NOT EXISTS (
SELECT 1
FROM “banned_users”
WHERE (issues.author_id = banned_users.user_id)
)
)
AND (
issues.confidential IS NOT TRUE
OR (
issues.confidential = TRUE
AND (
issues.author_id = 5961404
OR EXISTS (
SELECT TRUE
FROM issue_assignees
WHERE user_id = 5961404
AND issue_id = issues.id
)
OR EXISTS (
SELECT 1
FROM “project_authorizations”
WHERE “project_authorizations”.“user_id” = 5961404
AND (
project_authorizations.project_id = issues.project_id
)
AND (project_authorizations.access_level >= 20)
)
)
)
)
AND “projects”.“namespace_id” IN (
SELECT “namespaces”.“id”
FROM “namespaces”
WHERE (traversal_ids @> (‘{9970}’))
)
AND (
EXISTS (
SELECT 1
FROM “project_authorizations”
WHERE “project_authorizations”.“user_id” = 5961404
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10)
)
OR projects.visibility_level IN (10, 20)
)
AND (
“project_features”.“issues_access_level” IS NULL
OR “project_features”.“issues_access_level” IN (20, 30)
OR (
“project_features”.“issues_access_level” = 10
AND EXISTS (
SELECT 1
FROM “project_authorizations”
WHERE “project_authorizations”.“user_id” = 5961404
AND (project_authorizations.project_id = projects.id)
AND (project_authorizations.access_level >= 10)
)
)
)
AND (“issues”.“state_id” IN (1))
AND “projects”.“archived” = FALSE
AND EXISTS (
SELECT “label_links”.*
FROM “label_links”
WHERE “label_links”.“target_type” = ‘Issue’
AND “label_links”.“target_id” = “issues”.“id”
AND “label_links”.“label_id” = 14918378
)
AND EXISTS (
SELECT “label_links”.*
FROM “label_links”
WHERE “label_links”.“target_type” = ‘Issue’
AND “label_links”.“target_id” = “issues”.“id”
AND “label_links”.“label_id” IN (2779335, 3482026)
)
AND NOT (
EXISTS (
SELECT “label_links”.*
FROM “label_links”
WHERE “label_links”.“target_type” = ‘Issue’
AND “label_links”.“target_id” = “issues”.“id”
AND 1 = 0
)
)
AND “issues”.“issue_type” IN (0, 1)
AND “projects”.“archived” = FALSE
ORDER BY “issues”.“created_at” DESC,
“issues”.“id” DESC
LIMIT 20 OFFSET 0;
Note one of the conditional clauses in the query:
AND NOT (
EXISTS (
SELECT “label_links”.*
FROM “label_links”
WHERE “label_links”.“target_type” = ‘Issue’
AND “label_links”.“target_id” = “issues”.“id”
AND 1 = 0
)
)
This is equivalent to AND NOT (FALSE)
but using the former produces a very slow query plan (the subquery for retrieving project ids gets materialized and nested under a loop):
If we use AND NOT (FALSE)
, the query plan is much faster:
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7091/commands/25077
Workarounds
Do not use deleted/non-existing labels as filters.