Override ci_job_artifacts metrics queries to avoid expensive join

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem

The Geo metrics collection for ci_job_artifacts performs an expensive join between p_ci_job_artifacts and ci_job_artifact_states that causes significant database load on large instances. The current query pattern:

SELECT COUNT("p_ci_job_artifacts"."id") FROM "p_ci_job_artifacts"
INNER JOIN "ci_job_artifact_states" ON 
    "ci_job_artifact_states"."partition_id" IS NOT NULL
    AND "ci_job_artifact_states"."job_artifact_id" = "p_ci_job_artifacts"."id"
    AND "ci_job_artifact_states"."partition_id" = "p_ci_job_artifacts"."partition_id"
WHERE "p_ci_job_artifacts"."id" >= $1 
AND "p_ci_job_artifacts"."id" < $2;

This join performs random lookups for each row, consuming excessive memory bandwidth (observed at ~15GB/s on dedicated instances).

Proposal

Override the count queries in Geo::JobArtifactReplicator to query ci_job_artifact_states directly, leveraging the foreign key constraint. This is safe because the foreign key guarantees referential integrity.

Example implementation:

def self.primary_total_count
  batch_count(Geo::JobArtifactState)
end

Similar overrides would be added for checksummed_count, checksum_failed_count, and checksum_total_count.

This provides approximately 4x performance improvement and reduces database pressure significantly.

Related to #512646

Edited by 🤖 GitLab Bot 🤖