Merge request diffs are currently calculated by git diff target...source which compares HEAD of target with the merge base of target and source. This works well until changes from the target branch are merged in to the source branch, creating a complete mess of the diff.
Intended users
Developers
Further details
For a Developer using GitLab, code review is primarily reading the diff generated by GitLab and leaving feedback. If the diff contains changes that have already been merged to the target branch this is:
confusing to the reviewer because they will be seeing unrelated changes in the diff
time consuming for the reviewer because they will have to review a larger diff, and spend more time working out in their head what changes are actually being merged
time consuming for the author who must wait on a longer review cycle and reduce cycle time.
Merge refs will allow us to calculate a more accurate diff that doesn't have these issues.
Proposal
When a person is reading a merge request diff, a new diff mode will be available in the drop down master (HEAD) indicating that this diff is against the head of the target branch, not the merge base of the target and source branch.
Behind the scenes this will use the merge-ref feature implemented for CI.
Every time the target branch is updated the merge request diff becomes out of date which happens over 20 times a day for gitlab-ce.
Limitations
In this first iteration:
commenting will not be supported on the new master (HEAD) mode
instead of automatically recalculating the diff when commits are pushed to the source or target branch, automatically recalculate the diff the next time the merge request is loaded
In the future, the new diff mode will replace the existing diff mode once commenting support is implemented and performance improvements have been made.
Documentation
Testing
What does success look like, and how can we measure that?
GitLab should have accurate diffs that do not show misleading information. Although this is a feature, it is perceived as a bug by our customers who have noted that the diffs in GitHub and Bitbucket better meet their expectations. Thus, success is measured by no long receiving bug reports and support requests about diff accuracy.
James Ramsay (ex-GitLab)changed title from Add new merge request comparison mode using merge ref diffs to Update merge request comparisons to use merge ref diffs
changed title from Add new merge request comparison mode using merge ref diffs to Update merge request comparisons to use merge ref diffs
@jramsay I don't think the initial scope here would already change all comparison modes to use merge ref diffs! @oswaldo and I explored that for a bit in &854 (comment 161785578), and it's quite involved.
@oswaldo We already talked about this at length in &854 (closed), but it would be awesome if you can spend some more time looking into what the initial scope here could look like from a technical perspective, to help whoever will be working on this in %12.1!
For this first iteration we're planning to support the diff between the target branch HEAD and source branch HEAD as opposed to the current regular git diff target...source (which fetches the base of target and source to use in the diffing). For that, we're going to use the MergeRequest#merge_ref_head, which is the content of a merge between source and target branches. For more context, read https://blog.developer.atlassian.com/a-better-pull-request/.
As we're not planning to support diffing between different merge diff ref versions (for now), we're able to persist less. For instance, we won't need separate merge_request_diff row and merge_request_diff_files to cache the diff raw data, though we still need to cache the diff highlighting (which is considerably heavy for the BE to do on the fly).
Merge ref and updates
Currently we automatically "refresh" the merge ref HEAD at MergeRequests::MergeabilityCheckService, which BTW is called every time an user accesses the merge request page. It happens when the MergeRequest#merge_status goes from unchecked to can_be_merged.
As we've discussed in the links above, we'll need to be able to:
Cache highlighted diff (and refresh it when the merge ref HEAD gets updated)
For the current (persisted) diffs, we go through MergeRequests::ReloadDiffsService, which persists a new merge_request_diff row, updates discussion positioning and resets the cache at MergeRequests::RefreshService. It seems like we're able to drop the last two I mention from this service. We may consider having a separated MergeRequest#reload_merge_ref_diff which wraps this logic, but for the merge-ref diff instead.
We'd be able to either have this method being called at MergeabilityCheckService in case of success, or alternatively, considering it might get a bit time consuming (given we call it upon page load), we could move it to the RefreshService (after setting the MR as unchecked):
Pitfall: We might end up with an outdated diff if target-branch introduces a conflicting commit and the user sends a new commit as source branch HEAD. The newer commit from the user won't be at the merge-ref. That gets trickier as we might need to consider changing the diff strategy on the fly, or generate the merge-ref with conflicts!
Comparison
As mentioned earlier, the merge_ref_diff don't necessarily needs to be persisted, so we should be able to:
Note that it returns a Gitlab::Diff::FileCollection::Compare, which doesn't handle caching reading nor writing. Because of that, we might want to consider using a lower level class such as Gitlab::Git::Compare, which would receive almost the same arguments, but we'd be able to wrap it in a different file collection class (see Gitlab::Diff::FileCollection::MergeRequestDiff as a reference, we should be able to reuse some logic of it).
That's a brief overview, but I believe there's more involved and we might need to persist at least the merge_ref_commit_sha in merge_requests.
I feel that by the end of this iteration we'll have at least some duplicated diff logic given we'll keep the old persisted diff implementation running.
considering it might get a bit time consuming (given we call it upon page load), we could move it to the RefreshService (after setting the MR as unchecked):
@oswaldo Since the RefreshService is called every time an MR's target branch is updated, this would mean recalculating the merge ref of hundreds of open MRs every time someone pushes to master on gitlab-ce, for example I think that'll be too expensive.
Would it be an option to run reload_merge_ref_diff on demand when the merge ref diffs endpoint is actually requested and we find them to be out of date, while continuing to run the MergeabilityCheckService on demand every time any MR page/endpoint is loaded?
Would it be an option to run reload_merge_ref_diff on demand when the merge ref diffs endpoint is actually requested and we find them to be out of date, while continuing to run the MergeabilityCheckService on demand every time any MR page/endpoint is loaded?
@DouweM But then we'd need to make sure the endpoint is called at all, which we were not considering before, right?
Since the RefreshService is called every time an MR's target branch is updated, this would mean recalculating the merge ref of hundreds of open MRs every time someone pushes to master on gitlab-ce, for example I think that'll be too expensive.
That's right. I wonder what would be that limit (for Gitaly), because there's still the option to wrap it in a separate worker (e.g. MergeRefWorker), but it wouldn't make any lighter for Gitaly still, but for the client.
We agreed to disagree calculating the merge-ref only when the actual MR files get changed at the target-branch is an option (it'll lead to problems, ultimately not allowing https://gitlab.com/gitlab-org/gitlab-ce/issues/58496).
@DouweM Considering we won't recalculate discussion positions at reload_merge_ref_diff (for now), it'll only refresh the highlighting cache for the current merge_ref_commit_sha. Which might still be acceptable upon page load.
But then we'd need to make sure the endpoint is called at all, which we were not considering before, right?
@oswaldo But that's the earliest time we actually need the merge ref diff, isn't it? It'd basically be a lazily evaluated MergeRequest#merge_ref_diff. That means we'd spread the expensive lazy work over the initial request for the MR page (which calls MergeRequest#check_mergeability) and the second request for the actual merge ref diffs, instead of doing it all in the initial request for the MR page, at which point I don't think we actually need the merge ref diffs yet.
Considering we won't recalculate discussion positions at reload_merge_ref_diff (for now), it'll only refresh the highlighting cache for the current merge_ref_commit_sha. Which might still be acceptable upon page load.
With that, I mean we could actually call it inside MergeabilityCheckService.
@oswaldo Yeah, it won't be quite so expensive yet until we add discussions into the mix, so this may be good enough for now.
I think that once we add discussions into the mix, we should actually do this work in a background worker, and have the frontend poll until it we've actually generated and cached the data and updated the discussion positions, so that the web process will use less memory, and we won't hit Gitaly timeouts, or timeouts because there are just so many discussions to iterate over.
I think that once we add discussions into the mix, we should actually do this work in a background worker, and have the frontend poll until it we've actually generated and cached the data and updated the discussion positions, so that the web process will use less memory, and we won't hit Gitaly timeouts, or timeouts because there are just so many discussions to iterate over.
User accesses the main page, MergeabilityCheckService kicks in and update merge_status and the merge-ref
This service could kick-off a MergeRefWorker, or ReloadMergeRefDiffWorker
ReloadMergeRefDiffWorker could absorb merge_request.reload_merge_ref_diffwith the position updates
merge_request.merge_ref_update_ongoing? could know it's still scheduled/running via a jid persisted at merge_requests (then we're able to use that for the poll)
We'll need to consider how this will work with discussions in the mix, because for discussions.json to load with latest version, we should need merge_request.merge_ref_update_ongoing? == false. Though that's something to have in mind for a next iteration.
If we decide switching the strategy if there are conflicts, we could make merge_ref_diff method smart so that if the ref is outdated we fallback to the regular diff, something like:
Though, that would also mean the discussions will need to be on this diff as well (which the update we'd need to be able to handle beforehand at RefreshService).
@oswaldo I think falling back on the regular merge base diff is acceptable, but I think we should make that clear to the user by displaying a banner saying something like:
Until the merge conflicts are resolved, the contents of files and the effect of specific changes will be out of date with the target branch.
@oswaldo@jramsay To reduce the scope of the first iteration, and to increase the likelihood that we can actually ship something in 12.1, what do you think about ignoring diff discussions for the moment?
We could have this new option not be the default yet, and show a banner saying that discussions are not displayed for the moment.
@oswaldo Do you see other potential ways of reducing the scope here?
@oswaldo Once we add support for discussions, what do you think about not just having a single position on DiffNote, but rather a list/relation of positions corresponding to all of the MR versions/diffs the discussion was ever displayed on, and thus had its position traced to? That way, the discussion can be displayed on both merge_ref_diff, merge_request_diff, and any other MR version the user may select, and we could generate new positions for the merge_ref_diff in MergeabilityCheckService, without throwing away the position for the regular merge_request_diff we got through the RefreshService.
Do you see other potential ways of reducing the scope here?
I think taking the discussions out of the equation will make things more straightforward for a first iteration. I can't think of any other point for now. Nice call BTW.
Once we add support for discussions, what do you think about not just having a single position on DiffNote, but rather a list/relation of positions corresponding to all of the MR versions/diffs the discussion was ever displayed on, and thus had its position traced to? That way, the discussion can be displayed on both merge_ref_diff, merge_request_diff, and any other MR version the user may select, and we could generate new positions for the merge_ref_diff in MergeabilityCheckService, without throwing away the position for the regular merge_request_diff we got through the RefreshService.
@DouweM I lack some context for the PositionTracer, but it seems quite reasonable to me ATM.
what do you think about ignoring diff discussions for the moment?
@DouweM@oswaldo so would this new diff mode not support adding or viewing discussion initially? That sounds like a good way of reducing scope to me. It would be confusing if we created a temporary situation where comments might be in one place but not the other, and then have to migrate all the comments to the same diff when we get this working more fully.
With regards to the situation where there are conflicts, I need to give this some more thought.
so would this new diff mode not support adding or viewing discussion initially? That sounds like a good way of reducing scope to me.
@jramsay That's correct, and I'd argue that it shouldn't be the default view until we do implement discussion support.
It would be confusing if we created a temporary situation where comments might be in one place but not the other, and then have to migrate all the comments to the same diff when we get this working more fully.
How do you mean "might be in one place but not the other"?
How do you mean "might be in one place but not the other"?
@DouweM I was trying to say it would be confusing if a comment on the diff calculated on the merge base wasn't shown on the diff calculated from the merge-ref and vice versa. If commenting is only enabled on the merge base diff and not the merge ref diff, this isn't a problem.
James Ramsay (ex-GitLab)changed title from Update merge request comparisons to use merge ref diffs to Add minimal merge-ref based diff option to merge request comparisons
changed title from Update merge request comparisons to use merge ref diffs to Add minimal merge-ref based diff option to merge request comparisons
Thanks @oswaldo for pointing out that it would be bad to tackle this at the same time as integrating the caching changes for &1816 - deferring until that is complete, hopefully 12.7
@andr3 I checked on this yesterday and it was still blocked. Patrick was OOO for holidays when his MR had a small change requested on it, as well as some feedback requested from UX. He is back this week, so hopefully this week.
Just want to note that when the MR (!21026 (merged)) gets merged to resolve #29984 (closed), it won't be enabled by default since it's feature flagged. @oswaldo and I agreed on testing it out on gitlab-org/gitlab project first to see how it's going to perform before enabling it by default.
Not sure how it's blocking this issue though. We are already creating the merge ref which is needed to resolve this issue (based on how I understand it). The issue (#29984 (closed)) I'm working on is making the generation asynchronous. I'm probably just missing something here though.
If it is indeed blocking this, would the backend be unblocked if the referenced MR gets merged? Since it's feature flagged and disabled by default, we can merge it without any changes in UX yet.
Not sure how it's blocking this issue though. We are already creating the merge ref which is needed to resolve this issue (based on how I understand it). The issue (#29984 (closed)) I'm working on is making the generation asynchronous. I'm probably just missing something here though.
Ideally any persistence of the merge_ref_sha, or raw diffs representing it, should be done after the mergeability check process is completed, meaning either sync or async. So I'd imagine that this process would take place either at MergeRequestMergeabilityCheckWorker, or in a job scheduled by MergeRequestMergeabilityCheckWorker.
I'll make a call with Igor tomorrow to discuss a bit further, but persisting raw diffs (if we decide going on that direction) in sync doesn't sound like an option.
@m_gill@jramsay Since it may have an impact on 12.8 planning, I'll mention it here that I've created !23157 (merged) for the backend part of the first iteration, but I still need reviews from @oswaldo and @nick.thomas in order to evaluate the potential of its being merged. If/When it's merged, we can just create a separate frontend issue for #27008 (comment 260556736) and also extract some precious thoughts about further iterations (caching, discussions...) into specific issues. And then we can close this one
I see lots of great work being done on this issue - what is the likelihood it will make it into 12.8? https://gitlab.my.salesforce.com/00161000008ptPz would like to get an update on the expected availability.
@oheigre the issue is now in workflowverification, so this should be in %12.8. Please note it's a very rough first iteration, comment isn't supported and this diff mode won't be the default diff displayed.
What will be the reason for comments not being supported? It will be useless anyway, you want actually comment real changes and check those comments on gitlab, so what's the point of something that i can already do in IDE and tell my colleagues using other communication options? You should focus on releasing this feature with comments supported really.
What will be a workaround? Switch to this new diff, check changes, then switch back to old one only to comment? This is really bad UX.
It's already almost 4 years when this issue was created here gitlab-foss#15140 (closed), just plan it for the next year and do it properly, not just some half made feature, the whole point of changes tab on github/gitlab/bitbucket in 90% of cases to actually add comments to changes, what will be the point if you can't comment on on type of diff, but can on other?
@Jurigag@lcsondes the comments are going to be released in the later releases: #198457 (closed), stay tuned! The idea is to implement it iteratively and only when comments are supported for the master (HEAD) version, it will become a default option
I've moved the frontend work that left to be done in a separate issue in order to clearly express the requirements, which may have lost among the discussions here: #209071 (closed)
master (HEAD) now appears in the dropdown, but it is not clickable -- the mouse cursor when hovering over it is not that for a link (hand pointing) like with the other options in the dropdown, but rather it is the "text" cursor (the one that looks like a capital 'I' in some fonts). Nothing happens when clicking on it other than the dropdown menu going away. I can't take a screenshot of the cursor itself because of limitations in my OS.
Other details: master (base) is the default view (why?) and all other versions can be selected, just not HEAD. Adding the diff_head=true query parameter selects master (HEAD) from the dropdown but still doesn't display the diff of the merge request with the head of master.
@linuz.vipin Hi The master (HEAD) option should be already displayed in the dropdown, just like in this merge request: gitlab-test!51 (diffs), but master (base) option is still the default one at the moment