Add diagnostic logging for stuck auto-merge MRs
What does this MR do and why?
Adds diagnostic logging, behind the default-off auto_merge_diagnostic_logging ops flag, to pin down the root cause of intermittently stuck auto-merge merge requests (#596177) — MRs that stay unmerged after a green pipeline and only merge once the page is loaded.
The stuck state happens when the auto-merge worker reads the CI mergeability check as checking even though the pipeline has already succeeded, then bails with no retry. The pipeline-success trigger runs in run_after_commit and the worker is pinned to that WAL location, so a stale pipeline-status read is unlikely — the prime suspect is the pipeline_creating? Redis flag, which forces checking regardless of the database (and is therefore immune to the previously-tried, now-reverted "force primary reads" change). This logging confirms that, or points instead at a stale/mismatched head_pipeline.
What it logs (flag-gated, per-project, non-success only)
MergeRequests::Mergeability::CheckCiStatusService— when CI ischecking/failurefor an auto-merge MR, logsauto_merge_ci_diagnosticwithpipeline_creating, the rawpipeline_creation_requests,head_pipeline_idvsdiff_head_pipeline_id,head_pipeline_status,merge_status, anddiff_head_sha.AutoMergeProcessWorker— logsauto_merge_worker_invokedwith the trigger source andtriggering_pipeline_ids, to correlate the bailing run with the trigger that fired it.
How to read it
| Log shows | Verdict |
|---|---|
ci_check_status=checking, head_pipeline_status=success, pipeline_creating=true |
stale pipeline_creating? Redis flag |
pipeline_creation_requests has a lingering in_progress entry |
orphaned creation request |
head_pipeline_status running + triggering_pipeline_ids differ from diff_head_pipeline_id |
stale/mismatched head pipeline |
diff_head_pipeline_id=nil while head_pipeline_id is set |
sha mismatch |
No behaviour change while the flag is off.
Rollout
- Enable for the affected project only, then watch for an
auto_merge_ci_diagnosticline on the next stuck MR:/chatops run feature set auto_merge_diagnostic_logging true --project=datahow/projects/dhl3/devops/deployment-dhl-multi - Remove the flag and this logging once the root cause is confirmed.
Related
- Related to #596177
MR acceptance checklist
- Tests added (flag on / flag off)
- Behind a feature flag (
auto_merge_diagnostic_logging, default off) - No documentation changes needed