Re-enqueue auto-merge worker for unchecked MRs

What does this MR do and why?

Related to #594868 — auto-merge fails for concurrent MRs targeting the same branch.

When multiple MRs target the same branch with auto-merge enabled and their pipelines succeed concurrently, merging one MR triggers mark_as_unchecked on all sibling MRs. No mechanism re-triggers mergeability checks for those MRs, leaving them stuck in unchecked state with auto_merge_enabled=true.

Changes

  • Added enqueue_auto_merge_for_unchecked in RefreshService that filters already-loaded MRs in Ruby (no extra DB query) for those with auto_merge_enabled=true and enqueues AutoMergeProcessWorker
  • Uses fixed 3-second staggered delays (0s–57s) so workers from the same or overlapping push events don't synchronize — each MR gets a delay of index * 3 seconds
  • Capped at 20 MRs per push event to bound queue volume; remaining MRs are picked up in subsequent rounds as merges trigger new pushes
  • Gated behind the auto_merge_on_mark_as_unchecked feature flag with a project actor

How the fix works

When a push to a target branch (e.g. master) occurs, UpdateMergeRequestsWorker calls RefreshService, which runs batch_mark_as_unchecked on all open MRs targeting that branch. After this, enqueue_auto_merge_for_unchecked identifies MRs with auto_merge_enabled=true from the already-loaded set and enqueues AutoMergeProcessWorker with staggered delays.

The worker calls mergeable? which triggers MergeabilityCheckService, transitioning the MR through checkingcan_be_merged, at which point the existing can_be_merged callback processes the auto-merge.

Worker deduplication (deduplicate :until_executed, if_deduplicated: :reschedule_once) prevents duplicate processing per MR.

Why staggered delays?

Each merge into a target branch triggers mark_as_unchecked for all sibling MRs, re-enqueuing workers. Without staggering, N sequential merges produce O(N²) near-simultaneous workers. A fixed 3-second interval ensures workers are evenly spaced and predictable. The 20 MR cap limits queue volume per push event while the remaining MRs drain naturally in subsequent rounds.

MRs remaining Delay range
1–5 0s–12s
6–10 15s–27s
11–15 30s–42s
16–20 45s–57s

Why service-layer only (no model callback)

Side effects are kept in RefreshService rather than an after_transition model callback because:

  1. The service layer has the full batch of affected MRs already loaded in memory, avoiding extra DB queries
  2. Avoids surprising background job fan-out from state machine transitions
  3. RefreshService is the code path that handles the concurrent merge scenario (triggered by UpdateMergeRequestsWorker on every push)

Feature flag

  • Name: auto_merge_on_mark_as_unchecked
  • Type: gitlab_com_derisk
  • Rollout issue: #594893

MR acceptance checklist

  • Tests added for staggered delays and batch limit
  • Feature flag YAML created
  • Rollout issue created: #594893
Edited by Marc Shaw

Merge request reports

Loading