Stuck Merge Train because of GitError

Background

In the previous issue, we've found the bug that causes merge trains stuck. That bug happens because of an internal Gitlab::Git::CommandError of Branches::DeleteService. Then we've fixed it by rescuing it.

Summary

On 2020-09-23, the merge train got stuck again. (internal link)

While investigating the logs and sentry errors, I've found these:

  1. 10:58:15.631 - AutoMergeProcessWorker / 71774644 was added to merge train
  2. 10:58:19.828 - AutoMergeProcessWorker was done and the pipeline is running
  3. 11:06:28.235 - The pipeline succeeded, and enqueued AutoMergeProcessWorker. AutoMergeProcessWorker tried to merge 71774644
  4. 11:07:08.019
    • AutoMergeProcessWorker called MergeTrains::RefreshMergeRequestsService -> MergeTrains::RefreshMergeRequestService
    • In MergeTrains::RefreshMergeRequestService#merge!, MergeRequests::MergeService succeeded, but merge_train.finish_merge! did not.
    • 71774644 was successfully merged, but cleanup_ref failed.
    • Then, MergeTrains::RefreshMergeRequestsService did not continue its job and subsequent merge requests were stuck.

Other errors:

Solution Proposal

We can rescue Gitlab::Git::Repository::GitError in MergeTrains::RefreshMergeRequestsService or somewhere else suitable.