Resolve "Fix query timeout in AdjournedProjectsDeletionCronWorker"
What does this MR do and why?
For #428445 (closed)
Why?
The query on AdjournedProjectsDeletionCronWorker
are failing.
https://log.gprd.gitlab.net/app/r/s/NLm06
The query generated from this worker is:
Let's name this Query 1
SELECT "projects"."id",
"projects"."name",
"projects"."path",
"projects"."description",
"projects"."created_at",
"projects"."updated_at",
"projects"."creator_id",
"projects"."namespace_id",
"projects"."last_activity_at",
"projects"."import_url",
"projects"."visibility_level",
"projects"."archived",
"projects"."avatar",
"projects"."merge_requests_template",
"projects"."star_count",
"projects"."merge_requests_rebase_enabled",
"projects"."import_type",
"projects"."import_source",
"projects"."approvals_before_merge",
"projects"."reset_approvals_on_push",
"projects"."merge_requests_ff_only_enabled",
"projects"."issues_template",
"projects"."mirror",
"projects"."mirror_last_update_at",
"projects"."mirror_last_successful_update_at",
"projects"."mirror_user_id",
"projects"."shared_runners_enabled",
"projects"."runners_token",
"projects"."build_allow_git_fetch",
"projects"."build_timeout",
"projects"."mirror_trigger_builds",
"projects"."pending_delete",
"projects"."public_builds",
"projects"."last_repository_check_failed",
"projects"."last_repository_check_at",
"projects"."only_allow_merge_if_pipeline_succeeds",
"projects"."has_external_issue_tracker",
"projects"."repository_storage",
"projects"."repository_read_only",
"projects"."request_access_enabled",
"projects"."has_external_wiki",
"projects"."ci_config_path",
"projects"."lfs_enabled",
"projects"."description_html",
"projects"."only_allow_merge_if_all_discussions_are_resolved",
"projects"."repository_size_limit",
"projects"."printing_merge_request_link_enabled",
"projects"."auto_cancel_pending_pipelines",
"projects"."service_desk_enabled",
"projects"."cached_markdown_version",
"projects"."delete_error",
"projects"."last_repository_updated_at",
"projects"."disable_overriding_approvers_per_merge_request",
"projects"."storage_version",
"projects"."resolve_outdated_diff_discussions",
"projects"."remote_mirror_available_overridden",
"projects"."only_mirror_protected_branches",
"projects"."pull_mirror_available_overridden",
"projects"."jobs_cache_index",
"projects"."external_authorization_classification_label",
"projects"."mirror_overwrites_diverged_branches",
"projects"."pages_https_only",
"projects"."external_webhook_token",
"projects"."packages_enabled",
"projects"."merge_requests_author_approval",
"projects"."pool_repository_id",
"projects"."runners_token_encrypted",
"projects"."bfg_object_map",
"projects"."detected_repository_languages",
"projects"."merge_requests_disable_committers_approval",
"projects"."require_password_to_approve",
"projects"."max_pages_size",
"projects"."max_artifacts_size",
"projects"."pull_mirror_branch_prefix",
"projects"."remove_source_branch_after_merge",
"projects"."marked_for_deletion_at",
"projects"."marked_for_deletion_by_user_id",
"projects"."autoclose_referenced_issues",
"projects"."suggestion_commit_message",
"projects"."project_namespace_id",
"projects"."hidden",
"projects"."organization_id"
FROM "projects"
WHERE (marked_for_deletion_at <= '2023-10-10')
AND "projects"."pending_delete" = FALSE
ORDER BY "projects"."id" ASC
LIMIT 100
Upon executing the same query on database-lab
, it also times out:
and surprisingly it appears to be using the index: idx_projects_on_repository_storage_last_repository_updated_at
, which is defined as:
CREATE INDEX idx_projects_on_repository_storage_last_repository_updated_at ON projects USING btree (id, repository_storage, last_repository_updated_at);
which isn't of any use to improve the performance of this query.
We already have an index named index_projects_aimed_for_deletion
:
CREATE INDEX index_projects_aimed_for_deletion ON projects USING btree (marked_for_deletion_at) WHERE ((marked_for_deletion_at IS NOT NULL) AND (pending_delete = false));
for this use case, but it isn't being used by the planner, which is surprising.
However, this index is currently being used by another query in our app, first introduced in MR: !37014 (merged)
Let's name this Query 2
SELECT
"projects".*
FROM
"projects"
WHERE (EXISTS (
SELECT
1
FROM
"project_authorizations"
WHERE
"project_authorizations"."user_id" = 506061
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (0, 10, 20))
AND (marked_for_deletion_at <= '2020-07-21')
AND "projects"."pending_delete" = FALSE
ORDER BY
"projects"."id" DESC
Plan from that MR: https://explain.depesz.com/s/cOCU - we can see that the index used is the current index_projects_aimed_for_deletion
.
Solution:
To be on the safer side, we are not deleting the current index so as to not disturb the current timings of Query 2. However, I tried to devise an index that would work for both query 1 and query 2, so that we can remove the current index_projects_aimed_for_deletion
index at a later time.
For query 1, my first try was creating a new index:
exec CREATE INDEX index_projects_aimed_for_deletion_new ON projects USING btree (id) WHERE ((marked_for_deletion_at IS NOT NULL) AND (pending_delete = false));
This worked for Query 1, but did not work for Query 2 effectively.
So, I changed the index to:
exec CREATE INDEX index_projects_aimed_for_deletion_new ON projects USING btree (id, marked_for_deletion_at) WHERE ((marked_for_deletion_at IS NOT NULL) AND (pending_delete = false));
and that seems to have worked for both Query 1 and Query 2.
Query plans:
On database-lab, I dropped the current index and created the new index as in this MR:
With this change, I ran both Query 1 and Query 2, and the results are:
Query 1: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/23241/commands/74753
Query 2: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/23241/commands/74752
And both seem to be performant.
However, we are not dropping the current index in this MR, so to test this state:
We introduce the new index:
Query plans:
Query 1: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/23249/commands/74760 - uses our new index and is performant.
Query 2: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/23249/commands/74759 - continues to use the existing index index_projects_aimed_for_deletion
.
Follow up
Drop the index index_projects_aimed_for_deletion
in a follow up MR at a later stage. #428732
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #428445 (closed)