Always Refresh Merge Requests in Merge Train from the beginning
Problem
We observed yet another case that Merge Train in www-gitlab-com was stuck due to an exception. This time, it was caused by database statement timeout and an MR in a train was not refreshed properly.
Given there are edge cases that thing could go wrong during a refresh and it's quite cumbersome to keep catching an exception per stuck, we should do a slight re-architecturing on the merge requests refresh mechanizm, to be more resilient and self-recovering.
Current refresh logic
- An MR is added to a train
- The system refreshes the added MR (current pointer) and the following MRs (up-to the
DEFAULT_CONCURRENCY
, which is 20)
This refresh mechanism has a problem that, if refresh was not correctly done from the initial MR to the current pointer, the merge train it-self cannot do self-recovering.
New refresh logic
- An MR is added to a train
- The system refreshes the initial MR (the first pointer) and the following MRs (up-to the
DEFAULT_CONCURRENCY
, which is 20)
The new refresh mechanism always refreshes from the initial MR whenever a new MR is added. This means, even if something went wrong in the refresh process on the initial MR, a recovery-refresh happens when a new MR is added.