Stuck Merge Train because of GitError

Background

In the previous issue, we've found the bug that causes merge trains stuck. That bug happens because of an internal Gitlab::Git::CommandError of Branches::DeleteService. Then we've fixed it by rescuing it.

Summary

On 2020-09-23, the merge train got stuck again. (internal link)

While investigating the logs and sentry errors, I've found these:

  • Example merge request iid/id: 63122/71774644
  • Example log: https://log.gprd.gitlab.net/goto/2034c97731c1b0b7803e72b7408f8186
  • Example sentry error: https://sentry.gitlab.net/gitlab/gitlabcom/issues/1789286/events/34044239/
  1. 10:58:15.631 - AutoMergeProcessWorker / 71774644 was added to merge train
  2. 10:58:19.828 - AutoMergeProcessWorker was done and the pipeline is running
  3. 11:06:28.235 - The pipeline succeeded, and enqueued AutoMergeProcessWorker. AutoMergeProcessWorker tried to merge 71774644
  4. 11:07:08.019
    • AutoMergeProcessWorker called MergeTrains::RefreshMergeRequestsService -> MergeTrains::RefreshMergeRequestService
    • In MergeTrains::RefreshMergeRequestService#merge!, MergeRequests::MergeService succeeded, but merge_train.finish_merge! did not.
    • 71774644 was successfully merged, but cleanup_ref failed.
    • Then, MergeTrains::RefreshMergeRequestsService did not continue its job and subsequent merge requests were stuck.

Other errors:

  • 71868572 - https://sentry.gitlab.net/gitlab/gitlabcom/issues/1789286/events/34050848/
  • 71641115 - https://sentry.gitlab.net/gitlab/gitlabcom/issues/1789286/events/34056676/
  • 71871151 - https://sentry.gitlab.net/gitlab/gitlabcom/issues/1789286/events/34056858/

Solution Proposal

We can rescue Gitlab::Git::Repository::GitError in MergeTrains::RefreshMergeRequestsService or somewhere else suitable.

Assignee Loading
Time tracking Loading