Skip to content

Fix merge train unnecessarily retries pipeline by a race condition

Summary

We've recently shipped !23395 (merged) for fixing state transition bugs on merge train. It's verified as a stable solution, however, we've discovered that there is another bug has to be resolved.

Problem

The problem is that pipelines for merge train are re-triggered before the system tries to merge the MR. It can be illustrated as the following:

  • MR1 is added to a merge train. A pipeline for merge train is created.
  • MR2 is added to a merge train. A pipeline for merge train is created.
  • MR1's pipeline finished and MR1 is merged.
  • MR2's pipeline finished but re-triggered (i.e. starts over), when the retriggered pipeline finished, MR2 is merged.

The reason of this behavior is, when the system checks merge train history, it only checks merge_commit_sha, which is set only after merge request has been merged, however, we only check

  1. The merge train starts merging MR1. When git-merge happens, it hooks PostReceive sidekiq worker to refresh the other MRs asynchronously.
  2. MergeTrains::CheckStatusService checks if the new revision is a part of merge train history. However, at this moment, merge_commit_sha is not set because MergeRequests::MergeService is not complete yet (i.e. there is a gap between git-merge and setting the actual merge sha to merge_request.merge_commit_sha). Thus, CheckStatusService thinks the new revision is not part of the history and mark the next MR as stale (i.e. re-trigger pipeline)

Reproduce

This problem is reproducible in local environment. Here is the steps:

  1. Add three MRs to train in a short span.
  2. Observe the pipeline count of each MRs. MR1 has 2 pipelines (correct), MR2 has 3 pipelines (should be 2), MR3 has 4 pipelines (should be 2).

Solution

  • We evaluate in_progress_merge_commit_sha when we check train history. This correctly works for merging state of merge trains.
  • We make sure that there is always overlap between in_progress_merge_commit_sha and merge_commit_sha, and either of column is evaluated during the train history check.

This Mr fixes #196133 (closed).

QA

I've confirmed on local machine that I cannot reproduce this problem with this fix.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Shinya Maeda

Merge request reports