Ensure CI builds are pending before dropping them from the queue

What does this MR do and why?

When a build is not pending, verify its state on the primary database before removing it from the queue. This prevents stale data in the ci_pending_builds table from causing builds to be incorrectly removed when they're still pending on the primary.

The new ci_build_confirm_pending_state feature flag controls this behavior:

  • When enabled: Check the primary database to confirm the build state
  • When disabled: Use the existing behavior (remove immediately)

This adds a new metric (build_stale_pending) to track when builds are found to still be pending after checking the primary.

Fixes issues with replica lag causing incorrect build removal in the queue.

Relates to https://gitlab.com/gitlab-com/request-for-help/-/issues/3690

References

Screenshots or screen recordings

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports

Loading