Merge Request stuck with abandoned active car after branch retargeting

The Problem

When using branch retargeting to set up a second dependent merge request, the second MR can get stuck on a abandoned train with no available actions:

The Cause and How to Reproduce

The problem is caused by the dependent MR being added to a train that targets the first MRs source branch, which gets deleted when the first MR is merged. If the first MR isn't configured to delete it's source branch, the consequences may be different but it is still some kind of issue.

Reproduction

Enable Merge Trains and MWCP for your project.
Set up MR1 with source branch mr-1-feature and target branch main, configured to delete source branch after merge.
Set up MR2 with source branch mr-2-addition, target branch mr-1-feature, and MR1 as a merge request dependency.
Review, approve, set MWCP (ATMTWCP) for MR2. It will not start the merge_train due to the unmet dependency
Review, approve, set MWCP (ATMTWCP) for MR1. It will merge normally, and delete the source branch.
MR2 will now start a merge train automatically and get stuck

At this point MR2 is stuck with an associated MergeTrain::Car that points to a target branch that does not exist.

The automatic retargeting of MR2:

The MergeTrain::Car created for MR2 that hasn't been deleted:

[ gprd ] production> mr.merge_train_car
=> 
#<MergeTrains::Car:0x00007f760ed74638
 id: 3977194,
 merge_request_id: 325992478,
 user_id: 3841241,
 created_at: Mon, 09 Sep 2024 18:07:13.379568000 UTC +00:00,
 updated_at: Mon, 09 Sep 2024 18:07:13.379568000 UTC +00:00,
 target_project_id: 278964,
 target_branch: "log-all-ci-job-token-usage",
 status: 0,
 merged_at: nil,
 duration: nil,
 pipeline_id: nil,
 index: nil>

MR2 is now considered to be on a train already:

[ gprd ] production> mr.mergeable_state?
=> true
[ gprd ] production> mr.mergeable_ci_state?
=> true
[ gprd ] production> mr.diff_head_pipeline_success?
=> true
[ gprd ] production> mr.mergeable?
=> true
[ gprd ] production> AutoMerge::MergeTrainService.new(mr.project, mr.merge_user).available_for?(mr)
=> true
[ gprd ] production> mr.on_train?
=> true
[ gprd ] production> mr.merge_train_car.status
=> 0
[ gprd ] production> mr.merge_train_car.active?
=> true

And so the AutoMerge::MergeTrainService silently no-ops it's execution for this MR: (source)

#
# Note: This service is called via metaprogramming in AutoMergeService
# which is triggered by the AutoMergeProcessWorker when a pipeline completes
#
module AutoMerge
  class MergeTrainService < AutoMerge::BaseService
    extend Gitlab::Utils::Override

    override :execute
    def execute(merge_request)
      # No-op if already on the train. Especially important because assigning a
      # duplicate has_one association with build_#{association} below would
      # permanently delete any pre-existing one, regardless of whether the new
      # one eventually gets saved. Just destroying a car is not safe and can
      # leave the train in an inconsistent state. See #cancel and #abort for the
      # extra steps necessary.
      return if merge_request.on_train?

      merge_request.build_merge_train_car(
        user: current_user,
        target_project: merge_request.target_project,
        target_branch: merge_request.target_branch
      )
      super do
        SystemNoteService.merge_train(merge_request, project, current_user, merge_request.merge_train_car)
      end
    end

A Proposal

Add a MergeTrain(::Car) cleanup service and add it to the EE::Branches::DeleteService, to destroy all MergeTrain::Car records targeting a deleted branch.
Add target branch handling to MergeRequest#on_train? and/or AutoMerge::MergeTrainService#execute to replace the train car if MergeRequest#target_branch doesn't match MergeTrain::Car#target_branch.

My current preference is Option 1, because leaving Car records hanging around that point to nowhere is kind of sloppy and will reflect an inaccurate state of a MergeRequest that appears to be on-train when it is in fact going nowhere. It's also more of a clean reset of the MR state, and we can be more sure of the immediate mergeability of the MR, rather than depending on merge checks that passed back when we were targeting a different branch.

Option 2, alone, would technically work but would be somewhat jankier because it would be like waiting until the last minute to update the state of the MR to reflect the existing state of the Project. It also has the aforementioned risks of not re-running merge checks after the target branch changes. This has some mild implications for git mergability, and also serious implications for target-based features like Approval Rules.

Caveat

The workflow I'm recommending in Option 1 sort of cuts against what may be some perceived efficiency benefits of having the second MR target the first. While it will still be easier to run CI while the MRs are being worked on and be sure of the results, the "reset" we'd be doing of the MR state feels like a productivity setback when compared to how this works without merge trains, where you can merge one more or less right after another. This wouldn't make anything necessarily worse, but is part of the existing incongruity of the branch retargeting and MergeTrain features.

cc @phikai, @francoisrose, and @zillemarco who I've chatted with about these features recently.

Edited Sep 18, 2024 by Max Fan