MergeRequest#base_pipeline does not work with pipelines for merged results.
Summary
I did some investigation into lots of unrelated errors and fixes being reported in CodeQuality for this merge request and the culprit turns out to be the merge commit we use to run pipelines for merged results.
For the base pipeline, we are correctly using one pointing at the diff_base_sha
, which is stable and unchanging throughout the lifetime of a merge request. Rebasing shouldn't affect this either - it's the SHA of the commit before the MR commits, i.e. one that lives in the target branch.
However the head_pipeline_sha
is pointing to a merge commit, and the state of the merge commit includes everything in master at the time the merge request pipeline, and in turn the merge commit, is created.
So we're comparing a fixed point in the merge request target branch history to all changes in both source and target branches, combined, at the moment in time the MR head pipeline is created. That's why the report changes periodically; it changes with every new MR pipeline because every MR pipeline creates a new merge commit with the current state of the target branch.
Is this a problem?
I believe it's the underlying cause of our MR diff versioning being catastrophically broken sometimes: Merge request version linking to incorrect diff (#208794 - closed)
Helpful Illustrations
Detached Merge Request, i.e. commit-on-branch, pipelines
graph TD
A["Commit A (master)"] --> B["Commit B (master, A+B, BASE_PIPELINE)"]
B --> C["Commit C (master, A+B+C)"]
B --> F["Commit F (feature, A+B+F, HEAD_PIPELINE)"]
C --> D["Commit D (master, A+B+C+D)"]
D --> E["Commit E (master, A+B+C+D+E)"]
Here, the comparison is clear between the before and after state of the commits in the merge request. The reports are out of date against current master (not including C
/D
/E
), but consistent and stable for the merge request's context.
Merge Request Pipelines (project state is a combination of both branches)
graph TD
A["Commit A (master)"] --> B["Commit B (master, A+B, BASE_PIPELINE)"]
B --> C["Commit C (master, A+B+C)"]
B --> F["Commit F (feature, A+B+F)"]
C --> D["Commit D (master, A+B+C+D)"]
F --> G["Commit Z (Ephemeral merge commit, A+B+C+D+F, HEAD_PIPELINE)"]
D --> E["Commit E (master, A+B+C+D+E)"]
D--> G
Here, the reports include newer information from master, which unintentionally makes things worse because our references to "before" and "after" the merge request are not in line with the actual states of the project as base_pipeline
and head_pipeline
.
Steps to reproduce
- Create a merge request to a project default branch from a feature branch
- Add commits to the default (target) branch while the merge request remains open.
- Check the head and base pipelines of the merge request and see that project state from the new target branch commits is incorporated into the merge request head pipeline.
Example Project
This happens on gitlab-org/gitlab, manifesting itself in several merge request widgets. I'll use CodeQuality as an example here because I'm familiar with it.
Proposal
Considered Options
To produce accurate reports:
-
base_pipeline
needs to point to commitD
OR
-
head_pipeline
needs to point to commitF
Option 1 has the benefit of including more recent information, and will be less out of date when they're first opened, as every MR pipeline creates a fresh merge commit with the current state of the target branch. However I did include commit E
in this diagram to illustrate that in a large and/or fast-moving project (GitLab), it's likely that these widgets will become out-of-date relatively quickly. They'll also be very likely to change with every single new MR pipeline. Widgets can be refreshed by a simple click to run a new merge request pipeline.
Option 2 has the benefit of working in the exact same way detached merge request pipelines do. The meaning of a merge request diff widget will be constant throughout every configuration of GitLab CI. To update stale widget data, the merge request will need to be rebased on the target branch, triggering a new merge request pipeline.
These options were both evaluated by @dosuken123, who recommended we choose Option 1 due to large breaking changes being involved in changes the implementation and technical definition of head_pipeline
What is the current bug behavior?
Under the hood: MergeRequest#base_pipeline
uses the MR #diff_base_sha
and #head_pipeline
uses a merged result, which incorporates changes from both source and target branches in the diff.
Visually: CodeQuality violations are reported from code already existing in the target branch that was committed after a merge request was opened.
What is the expected correct behavior?
Under the hood: When a merge request uses pipelines for merged results, MergeRequest#base_pipeline
will be present and pointing to that commit directly before the ephemeral merge commit used by the merged result pipeline, instead of the merge request's diff base.
Visually: The only violations reported are relevant to the new commits being introduced in a merge request source branch.
Suggested Implementation
@dosuken123: Having said that, I wonder if we should extend the concept of base_pipeline or introduce a concept like "The pipeline for calculating delta to the head pipeline". It'd be like
def pipeline_for_delta
if actual_head_pipeline.merge_request_pipeline? || actual_head_pipeline.merge_train_pipeline?
project.ci_pipelines
.order(id: :desc)
.find_by(sha: actual_head_pipeline.target_sha, ref: target_branch)
else
base_pipeline
end
end
I'd like to effectively implement this, but implement two separate methods to return pipelines of specific roles, and then have def base_pipeline
just pick between them. For example:
class MergeRequest
...
def diff_base_pipeline
# This is the current implementation of `#base_pipeline`. We're effectively renaming it to `diff_base_pipeline`.
@base_pipeline ||= project.ci_pipelines
.order(id: :desc)
.find_by(sha: diff_base_sha, ref: target_branch)
end
def merge_base_pipeline
project.ci_pipelines
.order(id: :desc)
.find_by(sha: actual_head_pipeline.target_sha, ref: target_branch)
end
def base_pipeline
if actual_head_pipeline.merge_request_pipeline? || actual_head_pipeline.merge_train_pipeline?
merge_base_pipeline
else
diff_base_pipeline
end
end
This achieves the functionality of the Option 1 proposal, while the renaming and self-documentation of the new base_pipeline
method makes it clearer that there are two types of "base" for different configurations of GitLab, and the criteria we use to choose between them for your merge request.
Decisions to make
- Is the logic here, as written in the example
#pipeline_for_delta
method, correct with regards to representing the "before" state of a merge request? - If so, is the naming, as proposed in the three methods
#diff_base_pipeline
,#merge_base_pipeline
and#base_pipeline
, correct and self-descriptive to another developer?