Geo: Override ci_job_artifacts metrics queries to avoid expensive join
What does this MR do and why?
Override checksum queries in JobArtifacts
This ensures the checksum counts are calculated from the state table and not as a join from the model table. This is to speed up counts when collecting Geo site status.
References
Related to #581987
Database query plans
Geo::JobArtifactReplicator.checksummed_count
MAX query:
SELECT MAX("ci_job_artifact_states"."job_artifact_id") FROM "ci_job_artifact_states" WHERE "ci_job_artifact_states"."verification_state" = 2
Plan (produced on 5000 records): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/46311/commands/141342
COUNT query:
SELECT COUNT("ci_job_artifact_states"."job_artifact_id") FROM "ci_job_artifact_states" WHERE "ci_job_artifact_states"."verification_state" = 2 AND "ci_job_artifact_states"."job_artifact_id" >= 0 AND "ci_job_artifact_states"."job_artifact_id" < MAX
Plan (produced on 5000 records): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/46311/commands/141357 I used an arbitrarily high number for the MAX to ensure I'd encompass all records.
Previous COUNT query:
SELECT COUNT("p_ci_job_artifacts"."id") FROM "p_ci_job_artifacts" INNER JOIN "ci_job_artifact_states" ON "ci_job_artifact_states"."partition_id" IS NOT NULL AND "ci_job_artifact_states"."job_artifact_id" = "p_ci_job_artifacts"."id" AND "ci_job_artifact_states"."partition_id" = "p_ci_job_artifacts"."partition_id" WHERE "ci_job_artifact_states"."verification_state" = 2 AND "p_ci_job_artifacts"."id" >= 0 AND "p_ci_job_artifacts"."id" < MAX
Geo::JobArtifactReplicator.checksum_failed_count
MAX query:
SELECT MAX("ci_job_artifact_states"."job_artifact_id") FROM "ci_job_artifact_states" WHERE "ci_job_artifact_states"."verification_state" = 3
Plan (produced on 5000 records): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/46311/commands/141344
COUNT query:
SELECT COUNT("ci_job_artifact_states"."job_artifact_id") FROM "ci_job_artifact_states" WHERE "ci_job_artifact_states"."verification_state" = 3 AND "ci_job_artifact_states"."job_artifact_id" >= 0 AND "ci_job_artifact_states"."job_artifact_id" < MAX
Plan (produced on 5000 records): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/46311/commands/141356 I used an arbitrarily high number for the MAX to ensure I'd encompass all records.
Previous COUNT query:
SELECT COUNT("p_ci_job_artifacts"."id") FROM "p_ci_job_artifacts" INNER JOIN "ci_job_artifact_states" ON "ci_job_artifact_states"."partition_id" IS NOT NULL AND "ci_job_artifact_states"."job_artifact_id" = "p_ci_job_artifacts"."id" AND "ci_job_artifact_states"."partition_id" = "p_ci_job_artifacts"."partition_id" WHERE "ci_job_artifact_states"."verification_state" = 3 AND "p_ci_job_artifacts"."id" >= 0 AND "p_ci_job_artifacts"."id" < MAX
Geo::JobArtifactReplicator.checksum_total_count
MAX query:
SELECT MAX("ci_job_artifact_states"."job_artifact_id") FROM "ci_job_artifact_states"
Plan (produced on 5000 records): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/46311/commands/141343
COUNT query:
SELECT COUNT("ci_job_artifact_states"."job_artifact_id") FROM "ci_job_artifact_states" WHERE "ci_job_artifact_states"."job_artifact_id" >= 0 AND "ci_job_artifact_states"."job_artifact_id" < MAX?
Plan (produced on 5000 records): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/46311/commands/141355 I used an arbitrarily high number for the MAX to ensure I'd encompass all records.
Previous COUNT query:
SELECT COUNT("p_ci_job_artifacts"."id") FROM "p_ci_job_artifacts" INNER JOIN "ci_job_artifact_states" ON "ci_job_artifact_states"."partition_id" IS NOT NULL AND "ci_job_artifact_states"."job_artifact_id" = "p_ci_job_artifacts"."id" AND "ci_job_artifact_states"."partition_id" = "p_ci_job_artifacts"."partition_id" WHERE "p_ci_job_artifacts"."id" >= 0 AND "p_ci_job_artifacts"."id" < MAX
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.