Merge trains can get stuck on unexpected errors

Summary

When unexpected errors occur in MergeTrains::RefreshMergeRequestService, the merge train stalls in a sidekiq job with no feedback for users. See our sentry (internal link) for some examples of errors that are happening there.

See also this zendesk ticket (internal link)

Steps to reproduce

Example Project

What is the current bug behavior?

What is the expected correct behavior?

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info


(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)
(we will only investigate if the tests are passing)

Possible fixes

Handle merge trains against the known exceptions in our sentry (internal link). These are all examples of specific code that needs to be more robust.
Make merge trains more robust to unexpected errors: Rescue StandardError in MergeTrains::RefreshMergeRequestService. If the parent refresh worker is on its last retry, capture and log the exception (internally); otherwise, re-raise the error (so so we get re-tries). Show a system note with a generic "Internal error" along with the correlation ID, e.g. "Merge request removed from the train due to an internal error (correlation ID ABCD)".
- This needs to be done carefully. For example, what should happen if there is a statement timeout when marking a train car as merged?

Edited Nov 27, 2025 by 🤖 GitLab Bot 🤖