Auto-merge fails for concurrent MRs: mark_as_unchecked breaks merge flow for sibling MRs

Summary

Auto-merge (merge_when_checks_pass) fails for concurrent MRs targeting the same branch. When multiple MRs have auto-merge enabled and their pipelines succeed within a short window, a chain reaction of mark_as_unchecked calls prevents ~60% of MRs from reaching can_be_merged state, leaving them stuck with no timeout or recovery mechanism.

This is a follow-up to #592733 (closed). The fix in !217382 (merged) (callback on transition to can_be_merged) resolves the single-MR scenario but does not help when concurrent merges reset sibling MRs back to unchecked.

Failure mechanism

  1. Multiple MRs target the same branch with auto-merge enabled, pipelines succeed
  2. AutoMergeProcessWorker picks up MR-A, merges it into the target branch
  3. mark_as_unchecked resets merge_status for all other MRs targeting that branch (MR-B, MR-C, ...)
  4. Workers for MR-B/MR-C see mergeable?=false (status is unchecked) and return without action
  5. The can_be_merged callback from !217382 (merged) never fires because the transition from unchecked to can_be_merged does not happen without an external trigger
  6. MR-B eventually gets rechecked and merges, but this triggers mark_as_unchecked again for MR-C and others
  7. Cycle repeats: each merge unsticks 1-2 MRs but resets the rest

No internal mechanism re-triggers mergeability checks after mark_as_unchecked for MRs with auto-merge enabled.

Steps to reproduce

  1. Create 30+ MRs targeting the same branch, each with a trivial one-line change
  2. Enable auto-merge on all of them (via API or glab mr merge --auto-merge)
  3. Wait for all pipelines to succeed
  4. Observe: 12 out of 30 merge (40%); the rest stay stuck with merge_status=unchecked and auto_merge_enabled=true

Workaround

An external GET /api/v4/projects/:id/merge_requests/:iid call on a stuck MR triggers check_mergeability through the response serializer (via detailed_merge_status), forcing unchecked -> checking -> can_be_merged -> callback -> merge. This requires external polling.

Test results (GitLab CE 18.9.3)

Approach Batch Merged Stuck Rate
Baseline (approve + auto-merge) 30 12 18 40%
+ retry loop (30x20s) 30 19 11 63%
+ poll GET + retry loop 50 34 16 68%
+ after_script GET nudge 50 39 11 78%
+ after_script list API recheck 50 40 10 80%

Environment

  • GitLab CE 18.9.3, single-node (8 vCPU, 16 GB RAM)
  • PostgreSQL 16 (managed), Redis (local container)
  • Sidekiq: 4 processes x 20 concurrency

Proposal

When mark_as_unchecked resets merge_status for MRs that have auto_merge_enabled=true, it should also re-enqueue AutoMergeProcessWorker for those MRs (similar to the can_be_merged callback in !217382 (merged)). This re-evaluates auto-merge MRs after their mergeability status is invalidated.

What is the current bug behavior?

MRs with auto-merge enabled get stuck in merge_status=unchecked when concurrent merges trigger mark_as_unchecked. No timeout or recovery mechanism exists.

What is the expected correct behavior?

All MRs with auto-merge enabled should eventually merge after their pipelines succeed, regardless of batch size (tested with 30-50 MRs targeting the same branch).

Relevant logs and/or screenshots

See test results table above.

GitLab version

18.9.3-ee (CE features only)

Edited by 🤖 GitLab Bot 🤖