postgres timeout MergeRequests::ContentController
Summary
Yesterday we upgraded our gitlab instance to v16.11.1-ee, now we see timeout errors in our database.
The upgrade forced upgrading to postgres14
the upgrade did not run smoothly, we had to run with increased timeout for postgres migrations
gitlab-ctl pg-upgrade --timeout=3600
When opening a MR view the MR status (last pipeline, artifacts etc) takes a long time (60s+) to load, so do the comments etc.
upon inspecting the postgres logs we found these queries timeout
2024-05-03_07:57:07.75104 STATEMENT: /*application:web,correlation_id:01HWYQ83YN8K6G5JPAC6NH26EX,endpoint_id:Projects::MergeRequests::ContentController#cached_widget,db_config_name:ci*/ SELECT 1 AS one FROM "p_ci_builds" INNER JOIN "p_ci_builds_metadata" ON "p_ci_builds_metadata"."partition_id" IS NOT NULL AND "p_ci_builds_metadata"."build_id" = "p_ci_builds"."id" AND "p_ci_builds_metadata"."partition_id" = "p_ci_builds"."partition_id" LEFT OUTER JOIN "p_ci_job_artifacts" ON "p_ci_job_artifacts"."file_type" = 2 AND "p_ci_job_artifacts"."partition_id" IS NOT NULL AND "p_ci_job_artifacts"."job_id" = "p_ci_builds"."id" AND "p_ci_job_artifacts"."partition_id" = "p_ci_builds"."partition_id" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."commit_id" = 751671 AND "p_ci_builds"."partition_id" = 100 AND ("p_ci_builds"."retried" = FALSE OR "p_ci_builds"."retried" IS NULL) AND "p_ci_builds_metadata"."has_exposed_artifacts" = TRUE LIMIT 1
2024-05-03_07:57:07.75107 FATAL: connection to client lost
2024-05-03_07:57:07.99096 ERROR: canceling statement due to statement timeout
What is the current bug behavior?
- MR view does not load in a reasonable time
- timeout errors in postgres log
What is the expected correct behavior?
- MR view loads in a few seconds
- no errors in postgres log
Relevant logs and/or screenshots
Edited by Martin Zihlmann