Negated filtering with non-existing label produces very slow query plans

When issues are filtered with a non-existing label, a very slow query plan is produced.

It's best to start with an example to illustrate the problem.

Suppose we want to filter for issues that don't have idontexist label and this idontexist label does NOT exist within gitlab-org group.

A corresponding issues list url with filters would be:

https://gitlab.com/groups/gitlab-org/-/issues?label_name%5B%5D=section%3A%3Adev&label_name%5B%5D=security&not%5Blabel_name%5D%5B%5D=idontexist

Visiting the above url may produce a query like:

SELECT “issues”.*
FROM “issues”
	INNER JOIN “projects” ON “projects”.“id” = “issues”.“project_id”
	LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE (
		NOT EXISTS (
			SELECT 1
			FROM “banned_users”
			WHERE (issues.author_id = banned_users.user_id)
		)
	)
	AND (
		issues.confidential IS NOT TRUE
		OR (
			issues.confidential = TRUE
			AND (
				issues.author_id = 5961404
				OR EXISTS (
					SELECT TRUE
					FROM issue_assignees
					WHERE user_id = 5961404
						AND issue_id = issues.id
				)
				OR EXISTS (
					SELECT 1
					FROM “project_authorizations”
					WHERE “project_authorizations”.“user_id” = 5961404
						AND (
							project_authorizations.project_id = issues.project_id
						)
						AND (project_authorizations.access_level >= 20)
				)
			)
		)
	)
	AND “projects”.“namespace_id” IN (
		SELECT “namespaces”.“id”
		FROM “namespaces”
		WHERE (traversal_ids @> (‘{9970}’))
	)
	AND (
		EXISTS (
			SELECT 1
			FROM “project_authorizations”
			WHERE “project_authorizations”.“user_id” = 5961404
				AND (project_authorizations.project_id = projects.id)
				AND (project_authorizations.access_level >= 10)
		)
		OR projects.visibility_level IN (10, 20)
	)
	AND (
		“project_features”.“issues_access_level” IS NULL
		OR “project_features”.“issues_access_level” IN (20, 30)
		OR (
			“project_features”.“issues_access_level” = 10
			AND EXISTS (
				SELECT 1
				FROM “project_authorizations”
				WHERE “project_authorizations”.“user_id” = 5961404
					AND (project_authorizations.project_id = projects.id)
					AND (project_authorizations.access_level >= 10)
			)
		)
	)
	AND (“issues”.“state_id” IN (1))
	AND “projects”.“archived” = FALSE
	AND EXISTS (
		SELECT “label_links”.*
		FROM “label_links”
		WHERE “label_links”.“target_type” = ‘Issue’
			AND “label_links”.“target_id” = “issues”.“id”
			AND “label_links”.“label_id” = 14918378
	)
	AND EXISTS (
		SELECT “label_links”.*
		FROM “label_links”
		WHERE “label_links”.“target_type” = ‘Issue’
			AND “label_links”.“target_id” = “issues”.“id”
			AND “label_links”.“label_id” IN (2779335, 3482026)
	)
	AND NOT (
		EXISTS (
			SELECT “label_links”.*
			FROM “label_links”
			WHERE “label_links”.“target_type” = ‘Issue’
				AND “label_links”.“target_id” = “issues”.“id”
				AND 1 = 0
		)
	)
	AND “issues”.“issue_type” IN (0, 1)
	AND “projects”.“archived” = FALSE
ORDER BY “issues”.“created_at” DESC,
	“issues”.“id” DESC
LIMIT 20 OFFSET 0;

Note one of the conditional clauses in the query:

	AND NOT (
		EXISTS (
			SELECT “label_links”.*
			FROM “label_links”
			WHERE “label_links”.“target_type” = ‘Issue’
				AND “label_links”.“target_id” = “issues”.“id”
				AND 1 = 0
		)
	)

This is equivalent to AND NOT (FALSE) but using the former produces a very slow query plan (the subquery for retrieving project ids gets materialized and nested under a loop):

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7091/commands/25076#visualize-depesz

If we use AND NOT (FALSE), the query plan is much faster:

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/7091/commands/25077

Workarounds

Do not use deleted/non-existing labels as filters.

Edited Oct 28, 2021 by euko
Assignee Loading
Time tracking Loading