MergeRequest#base_pipeline does not work with pipelines for merged results.

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

I did some investigation into lots of unrelated errors and fixes being reported in CodeQuality for this merge request and the culprit turns out to be the merge commit we use to run pipelines for merged results.

For the base pipeline, we are correctly using one pointing at the diff_base_sha, which is stable and unchanging throughout the lifetime of a merge request. Rebasing shouldn't affect this either - it's the SHA of the commit before the MR commits, i.e. one that lives in the target branch.

However the head_pipeline_sha is pointing to a merge commit, and the state of the merge commit includes everything in master at the time the merge request pipeline, and in turn the merge commit, is created.

So we're comparing a fixed point in the merge request target branch history to all changes in both source and target branches, combined, at the moment in time the MR head pipeline is created. That's why the report changes periodically; it changes with every new MR pipeline because every MR pipeline creates a new merge commit with the current state of the target branch.

Is this a problem?

I believe it's the underlying cause of our MR diff versioning being catastrophically broken sometimes: Merge request version linking to incorrect diff (#208794 - closed)

Helpful Illustrations

Detached Merge Request, i.e. commit-on-branch, pipelines

graph TD
  A["Commit A (master)"] --> B["Commit B (master, A+B, BASE_PIPELINE)"]
  B --> C["Commit C (master, A+B+C)"]
  B --> F["Commit F (feature, A+B+F, HEAD_PIPELINE)"]
  C --> D["Commit D (master, A+B+C+D)"]
  D --> E["Commit E (master, A+B+C+D+E)"]

Here, the comparison is clear between the before and after state of the commits in the merge request. The reports are out of date against current master (not including C/D/E), but consistent and stable for the merge request's context.

Merge Request Pipelines (project state is a combination of both branches)

graph TD
  A["Commit A (master)"] --> B["Commit B (master, A+B, BASE_PIPELINE)"]
  B --> C["Commit C (master, A+B+C)"]
  B --> F["Commit F (feature, A+B+F)"]
  C --> D["Commit D (master, A+B+C+D)"]
  F --> G["Commit Z (Ephemeral merge commit, A+B+C+D+F, HEAD_PIPELINE)"]
  D --> E["Commit E (master, A+B+C+D+E)"]
  D--> G

Here, the reports include newer information from master, which unintentionally makes things worse because our references to "before" and "after" the merge request are not in line with the actual states of the project as base_pipeline and head_pipeline.

Steps to reproduce

  1. Create a merge request to a project default branch from a feature branch
  2. Add commits to the default (target) branch while the merge request remains open.
  3. Check the head and base pipelines of the merge request and see that project state from the new target branch commits is incorporated into the merge request head pipeline.

Example Project

This happens on gitlab-org/gitlab, manifesting itself in several merge request widgets. I'll use CodeQuality as an example here because I'm familiar with it.

Proposal

Considered Options

To produce accurate reports:

  1. base_pipeline needs to point to commit D

OR

  1. head_pipeline needs to point to commit F

Option 1 has the benefit of including more recent information, and will be less out of date when they're first opened, as every MR pipeline creates a fresh merge commit with the current state of the target branch. However I did include commit E in this diagram to illustrate that in a large and/or fast-moving project (GitLab), it's likely that these widgets will become out-of-date relatively quickly. They'll also be very likely to change with every single new MR pipeline. Widgets can be refreshed by a simple click to run a new merge request pipeline.

Option 2 has the benefit of working in the exact same way detached merge request pipelines do. The meaning of a merge request diff widget will be constant throughout every configuration of GitLab CI. To update stale widget data, the merge request will need to be rebased on the target branch, triggering a new merge request pipeline.

These options were both evaluated by @dosuken123, who recommended we choose Option 1 due to large breaking changes being involved in changes the implementation and technical definition of head_pipeline

What is the current bug behavior?

Under the hood: MergeRequest#base_pipeline uses the MR #diff_base_sha and #head_pipeline uses a merged result, which incorporates changes from both source and target branches in the diff.

Visually: CodeQuality violations are reported from code already existing in the target branch that was committed after a merge request was opened.

What is the expected correct behavior?

Under the hood: When a merge request uses pipelines for merged results, MergeRequest#base_pipeline will be present and pointing to that commit directly before the ephemeral merge commit used by the merged result pipeline, instead of the merge request's diff base.

Visually: The only violations reported are relevant to the new commits being introduced in a merge request source branch.

Suggested Implementation

@dosuken123: Having said that, I wonder if we should extend the concept of base_pipeline or introduce a concept like "The pipeline for calculating delta to the head pipeline". It'd be like

  def pipeline_for_delta
    if actual_head_pipeline.merge_request_pipeline? || actual_head_pipeline.merge_train_pipeline?
      project.ci_pipelines
        .order(id: :desc)
        .find_by(sha: actual_head_pipeline.target_sha, ref: target_branch)
    else
      base_pipeline
    end
  end

I'd like to effectively implement this, but implement two separate methods to return pipelines of specific roles, and then have def base_pipeline just pick between them. For example:

class MergeRequest
  ...

  def diff_base_pipeline
    # This is the current implementation of `#base_pipeline`. We're effectively renaming it to `diff_base_pipeline`.
    @base_pipeline ||= project.ci_pipelines
      .order(id: :desc)
      .find_by(sha: diff_base_sha, ref: target_branch)
  end

  def merge_base_pipeline
    project.ci_pipelines
      .order(id: :desc)
      .find_by(sha: actual_head_pipeline.target_sha, ref: target_branch)
  end

  def base_pipeline
    if actual_head_pipeline.merge_request_pipeline? || actual_head_pipeline.merge_train_pipeline?
      merge_base_pipeline
    else
      diff_base_pipeline
    end
  end

This achieves the functionality of the Option 1 proposal, while the renaming and self-documentation of the new base_pipeline method makes it clearer that there are two types of "base" for different configurations of GitLab, and the criteria we use to choose between them for your merge request.

Decisions to make

  1. Is the logic here, as written in the example #pipeline_for_delta method, correct with regards to representing the "before" state of a merge request?
  2. If so, is the naming, as proposed in the three methods #diff_base_pipeline, #merge_base_pipeline and #base_pipeline, correct and self-descriptive to another developer?
Edited by 🤖 GitLab Bot 🤖