Auto-merge fails for concurrent MRs: mark_as_unchecked breaks merge flow for sibling MRs
Summary
Auto-merge (merge_when_checks_pass) fails for concurrent MRs targeting the same branch. When multiple MRs have auto-merge enabled and their pipelines succeed within a short window, a chain reaction of mark_as_unchecked calls prevents ~60% of MRs from reaching can_be_merged state, leaving them stuck with no timeout or recovery mechanism.
This is a follow-up to #592733 (closed). The fix in !217382 (merged) (callback on transition to can_be_merged) resolves the single-MR scenario but does not help when concurrent merges reset sibling MRs back to unchecked.
Failure mechanism
- Multiple MRs target the same branch with auto-merge enabled, pipelines succeed
-
AutoMergeProcessWorkerpicks up MR-A, merges it into the target branch -
mark_as_uncheckedresetsmerge_statusfor all other MRs targeting that branch (MR-B, MR-C, ...) - Workers for MR-B/MR-C see
mergeable?=false(status isunchecked) and return without action - The
can_be_mergedcallback from !217382 (merged) never fires because the transition fromuncheckedtocan_be_mergeddoes not happen without an external trigger - MR-B eventually gets rechecked and merges, but this triggers
mark_as_uncheckedagain for MR-C and others - Cycle repeats: each merge unsticks 1-2 MRs but resets the rest
No internal mechanism re-triggers mergeability checks after mark_as_unchecked for MRs with auto-merge enabled.
Steps to reproduce
- Create 30+ MRs targeting the same branch, each with a trivial one-line change
- Enable auto-merge on all of them (via API or
glab mr merge --auto-merge) - Wait for all pipelines to succeed
- Observe: 12 out of 30 merge (40%); the rest stay stuck with
merge_status=uncheckedandauto_merge_enabled=true
Workaround
An external GET /api/v4/projects/:id/merge_requests/:iid call on a stuck MR triggers check_mergeability through the response serializer (via detailed_merge_status), forcing unchecked -> checking -> can_be_merged -> callback -> merge. This requires external polling.
Test results (GitLab CE 18.9.3)
| Approach | Batch | Merged | Stuck | Rate |
|---|---|---|---|---|
| Baseline (approve + auto-merge) | 30 | 12 | 18 | 40% |
| + retry loop (30x20s) | 30 | 19 | 11 | 63% |
| + poll GET + retry loop | 50 | 34 | 16 | 68% |
| + after_script GET nudge | 50 | 39 | 11 | 78% |
| + after_script list API recheck | 50 | 40 | 10 | 80% |
Environment
- GitLab CE 18.9.3, single-node (8 vCPU, 16 GB RAM)
- PostgreSQL 16 (managed), Redis (local container)
- Sidekiq: 4 processes x 20 concurrency
Proposal
When mark_as_unchecked resets merge_status for MRs that have auto_merge_enabled=true, it should also re-enqueue AutoMergeProcessWorker for those MRs (similar to the can_be_merged callback in !217382 (merged)). This re-evaluates auto-merge MRs after their mergeability status is invalidated.
What is the current bug behavior?
MRs with auto-merge enabled get stuck in merge_status=unchecked when concurrent merges trigger mark_as_unchecked. No timeout or recovery mechanism exists.
What is the expected correct behavior?
All MRs with auto-merge enabled should eventually merge after their pipelines succeed, regardless of batch size (tested with 30-50 MRs targeting the same branch).
Relevant logs and/or screenshots
See test results table above.
GitLab version
18.9.3-ee (CE features only)