Migration ScheduleToArchiveLegacyTraces does not complete in production
The migration ScheduleToArchiveLegacyTraces
was added in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/19194. The query plan in this migration suggested it would take around 5 seconds to run. Unfortunately while deploying this migration, it never finishes. The underlying query to grab the data to process times out after 15 seconds, and will happily continue running pretty much forever with a larger timeout. Adding an index using create index concurrently yorick_test_deployments_yay on ci_builds (id) where type = 'Ci::Build' AND status IN ('success', 'failed', 'canceled');
did not improve performance of this query.
To work around this we have manually marked this migration as completed, allowing us to at least run the other migrations. This migration needs to be adjusted somehow, or turned into a noop, before 11.0 stable is deployed.
Query plan using just EXPLAIN
:
explain SELECT count(*) FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."status" IN ('success', 'failed', 'canceled') AND (NOT EXISTS (SELECT 1 FROM ci_job_artifacts WHERE ci_builds.id = ci_job_artifacts.job_id AND ci_job_artifacts.file_type = 3));
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=14307364.20..14307364.21 rows=1 width=8)
-> Merge Anti Join (cost=47.65..14184918.76 rows=48978178 width=0)
Merge Cond: (ci_builds.id = ci_job_artifacts.job_id)
-> Index Scan using ci_builds_pkey on ci_builds (cost=0.57..12174471.65 rows=55682317 width=4)
Filter: (((type)::text = 'Ci::Build'::text) AND ((status)::text = ANY ('{success,failed,canceled}'::text[])))
-> Index Only Scan using index_ci_job_artifacts_on_job_id_and_file_type on ci_job_artifacts (cost=0.56..1378469.34 rows=46910515 width=4)
Index Cond: (file_type = 3)
The first index scan processes an estimated 55 682 317 rows, while the second one processes an estimated 46 910 515 rows. This in combination with WHERE NOT EXISTS
means it will probably take hours for this query to run.