Ensure CI builds are pending before dropping them from the queue
What does this MR do and why?
When a build is not pending, verify its state on the primary database before removing it from the queue. This prevents stale data in the ci_pending_builds table from causing builds to be incorrectly removed when they're still pending on the primary.
The new ci_build_confirm_pending_state feature flag controls this
behavior:
- When enabled: Check the primary database to confirm the build state
- When disabled: Use the existing behavior (remove immediately)
This adds a new metric (build_stale_pending) to track when builds are
found to still be pending after checking the primary.
Fixes issues with replica lag causing incorrect build removal in the queue.
Relates to https://gitlab.com/gitlab-com/request-for-help/-/issues/3690
References
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.