Fix merge train unnecessarily retries pipeline by a race condition
Summary
We've recently shipped !23395 (merged) for fixing state transition bugs on merge train. It's verified as a stable solution, however, we've discovered that there is another bug has to be resolved.
Problem
The problem is that pipelines for merge train are re-triggered before the system tries to merge the MR. It can be illustrated as the following:
- MR1 is added to a merge train. A pipeline for merge train is created.
- MR2 is added to a merge train. A pipeline for merge train is created.
- MR1's pipeline finished and MR1 is merged.
- MR2's pipeline finished but re-triggered (i.e. starts over), when the retriggered pipeline finished, MR2 is merged.
The reason of this behavior is, when the system checks merge train history, it only checks merge_commit_sha
, which is set only after merge request has been merged, however, we only check
- The merge train starts merging MR1. When
git-merge
happens, it hooksPostReceive
sidekiq worker to refresh the other MRs asynchronously. -
MergeTrains::CheckStatusService
checks if the new revision is a part of merge train history. However, at this moment,merge_commit_sha
is not set becauseMergeRequests::MergeService
is not complete yet (i.e. there is a gap betweengit-merge
and setting the actual merge sha tomerge_request.merge_commit_sha
). Thus,CheckStatusService
thinks the new revision is not part of the history and mark the next MR as stale (i.e. re-trigger pipeline)
Reproduce
This problem is reproducible in local environment. Here is the steps:
- Add three MRs to train in a short span.
- Observe the pipeline count of each MRs. MR1 has 2 pipelines (correct), MR2 has 3 pipelines (should be 2), MR3 has 4 pipelines (should be 2).
Solution
- We evaluate
in_progress_merge_commit_sha
when we check train history. This correctly works formerging
state of merge trains. - We make sure that there is always overlap between
in_progress_merge_commit_sha
andmerge_commit_sha
, and either of column is evaluated during the train history check.
This Mr fixes #196133 (closed).
QA
I've confirmed on local machine that I cannot reproduce this problem with this fix.
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team