Resolve "For follow-up, fix pending job metric"
What does this MR do and why?
Fixes the pending job queue size metric, which was being clipped/capped at MAX_QUEUE_DEPTH since !225079 (merged). That MR introduced a LIMIT on the queue query to address slow DB queries with high pending builds, but as a side effect the queue_size metric no longer reflected the true backlog during high load. This made it harder for incident responders to accurately assess pending CI work from dashboards.
This MR introduces a conditional full count: when the limited query returns more results than MAX_QUEUE_DEPTH and the ci_register_job_full_queue_count feature flag is enabled, an additional COUNT query is executed on build_candidates to determine the actual queue size. Otherwise, the existing limited size is used.
This approach preserves the performance improvement from !225079 (merged) for the job matching logic while restoring metric accuracy behind a feature flag for safe rollout.
References
- Related to #593402 (closed)
- Feature flag rollout: #594931
- Root cause MR: !225079 (merged)
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.