In general, we probably haven't done a thorough failure analysis and code review for how that is handled, how we might improve the reliability of this feature. This issue really serves to do a full code audit/walkthrough to make sure the frontend/backend are working properly.
I was able to get the githost.log logs. I see a bunch of errors:
March 26, 2019 11:42 -> INFO -> MergeRequests::RebaseService info (namespace/project!195): rebase startedMarch 26, 2019 11:42 -> ERROR -> MergeRequests::RebaseService error (namespace/project!195): Rebase task canceled: Another rebase is already in progressMarch 26, 2019 11:43 -> INFO -> MergeRequests::RebaseService info (namespace/project!197): rebase startedMarch 26, 2019 11:43 -> ERROR -> MergeRequests::RebaseService error (namespace/project!197): Rebase task canceled: Another rebase is already in progressMarch 26, 2019 11:50 -> ERROR -> MergeRequests::RebaseService error (namespace/project!195): Rebase failed. Please rebase locallyMarch 26, 2019 11:50 -> ERROR -> MergeRequests::RebaseService error (namespace/project!195): GitLab: Failed to authorize your Git request: internal API unreachable<br>
The last internal API unreachable occurs when the API times out. I noticed they have a Git Push Rule active. We disabled it.
@meks Based on our documentation on Severity labels, I don't think ~S1 is appropriate. As far as I'm aware, the functionality isn't completely broken for everyone, it "just" fails more often than it should and it's often not clear why it failed, or how to resolve that. I would say this is ~S2, since it's a "Broken Feature, workaround too complex & unacceptable" and "Many users or multiple paid customers affected (but not apocalyptic)".
If ~S2 implies ~P2, this would be one to prioritize for %12.0.
@cindy I think in that case the "Failed to squash. Should be done manually." error message we show is correct. As Stan mentioned on the ticket:
patch does not apply means exactly that: The patch couldn't be squashed and rebased automatically because there was a conflict. They should attempt to run git rebase -i master to see this for themselves.
If there are other errors, we should address those, but this is a case where the conflict has to be resolved manually.
One reason why squashing may fail is when the MR contains a mix of new commits and merge commits merging the target branch into the source branch. MRs like that need to be rebased before they can be squashed and merged.
This customer keeps opening tickets about this issue I think the current behavior is pretty confusing for them since there's no error conveyed to the user and they can keep clicking the Merge button.
Incidentally, this patch failed error seems to be the most common error I see in a customer's githost.log.
In that merge request, the merge request branched at caac582bda8bf4b09610274afce7ce3897cf837b, but data/team.yml in master has been modified in a way that the patch created with git diff b6caf11cd136f8a974ba839af88d97c16c6830e2 ..caac582bda8bf4b09610274afce7ce3897cf837b no longer applies:
$ git checkout masterPrevious HEAD position was caac582bda Fixed information for Aleksandr SoborovSwitched to branch 'master'Your branch is up to date with 'origin/master'.$ git diff --binary b6caf11cd136f8a974ba839af88d97c16c6830e2.. caac582bda8bf4b09610274afce7ce3897cf837b > /tmp/squash.diff$ git apply --index--whitespace=nowarn < /tmp/squash.differror: patch failed: data/team.yml:15737error: data/team.yml: patch does not apply
Couple of points:
Squashing here is done by generating a diff and trying to apply that diff to master. That can easily fail if the diff does not often apply cleanly to a file that was changed in the target branch.
Squashing from the command-line via git rebase -i HEAD~3 and then merging that branch works fine. This is effectively what adding --3way to the git apply would do:
$ git apply --3way--index--whitespace=nowarn < /tmp/squash.differror: patch failed: data/team.yml:15737Falling back to three-way merge...Applied patch to 'data/team.yml' cleanly.
I can see using --3way might lead to unintended merges, but given that this is how git merge works, isn't this acceptable?
Can we just apply merge-commit (like from refs/merge) and rewrite commit message? :) This should always work exactly with the behavior of merge.
The simplest here would be: git show refs/merge-requests/iid/merge | git apply | git commit and then we could optionally create merge-commit for this single commit.
I would always surprised that we always create merge commit also for squashing, instead of pushing one extra commit on top of the repository, so instead of one squashed commit we end-up with two.
@jramsay and @DouweM it seems like there are more occurrences of this from support and it's costing us time and energy on something that should just work. I think we should consider %11.11 and have this assigned to someone.
it seems like there are more occurrences of this from support and it's costing us time and energy on something that should just work.
@meks could you please provide more supporting context as to why this is such a priority? I can't see any new Zendesk links. As explained by @DouweM above the "error" is actually correct and expected Git feedback because there is a merge conflict.
Also, this issue is unduly broad - squashing and rebasing are different features and shouldn't be conflated.
I have scheduled https://gitlab.com/gitlab-org/gitlab-ce/issues/54117 for 12.0 which a related issue that is much more focused and will close this issue. If there are other specific issues, happy to discuss and prioritize those accordingly.
This is from me observing and hearing feedback from our engineering leads that have been involved with customers. In addition to this, it's a known issue that has taken up debugging time for Customer Support with customers keep raising this as an issue. See also https://gitlab.com/gitlab-org/gitlab-ee/issues/10661#note_162216107 above.
The merge button is something that should just work from a customer standpoint. It seems prudent for us to fix. It would also reduce the tax on our Customer Support and etc.
I think we should reopen this issue since we are dealing with 2 different bugs.
@mlapierre @tnikic for your visibility of customer pain points in ~Create. When a fix for this lands it would make sense to look at the test pyramid for this area and see how we can prevent this from happening in the future.
Updated milestone to reflect 3-way merge improvement that @stanhu has made and should merge soon which will mean that squash/merge should succeed more frequently. Thank you @stanhu!
I still think we need to do a failure analysis of this feature and test we handle all the cases right:
sequenceDiagram; UI->>Controller: User clicks Merge Controller->>Controller: Check if mergeable Controller->>Controller: Check push rules Controller->>Controller: Check head SHA is consistent Controller->>Sidekiq: Schedule Sidekiq merge Sidekiq->>DB: Look up user and merge request DB->>Sidekiq: User and merge request found Sidekiq->>Gitaly: Squash request Gitaly->>Sidekiq: Squash successful Sidekiq->>Gitaly: Merge request Gitaly->>Sidekiq: Merge successful Sidekiq->>DB: Update merge status UI->>Controller: Merge status? Controller->>DB: Load merge request DB->>Controller: Merge request state Controller->>UI: Merge succcessful
Any of these steps can fail. What happens if any of them do? Do we show the right status in a timely manner? I think we can execute some tests. For example:
Keep Sidekiq off. What happens to the UI after the Schedule Sidekiq merge? Do we just show Merge in progress? What happens if we let this go for a long time?
What happens if Gitaly fails after the Squash request? Do we handle this error and show the Git output etc.?
Since we can't know everything ahead of time, ideally we would tag each step with the request correlation ID and log an event in a structured log so we can trace exactly what happened. This is what I mean by testing/developing for failure.
@stanhu Nice job on the Mermaid diagram :) I agree that we should do this, for merges (with and without squash) as well as rebases.
How do you estimate the urgency here? In all of the reports around squashing reliability that I've seen, the underlying issue appears to have been that unclear "patch does not apply" message (which gitaly!1214 (merged) should address), but I'm not seeing an underlying reliability issue here like the obvious one we have with rebases (https://gitlab.com/gitlab-org/gitlab-ce/issues/54117#note_121896366).
How do you estimate the urgency here? In all of the reports around squashing reliability that I've seen, the underlying issue appears to have been that unclear "patch does not apply" message (which gitaly!1214 (merged) should address), but I'm not seeing an underlying reliability issue here like the obvious one we have with rebases (https://gitlab.com/gitlab-org/gitlab-ce/issues/54117#note_121896366).
Yeah, I'm hoping gitaly!1214 (merged) will address the vast majority of the issues. I've also seen merging fail due to the internal API timing out.
I still feel like we have to be aggressive in testing/auditing this feature because this feature has to work. I will try to inject failures to see if we get the right feedback.
@jramsay Note that @stanhu is talking about squashing, while I was talking about rebasing, and Stan's explanation here is much more complete than mine was there!
gitaly!1214 (merged) is actually unrelated to either, and "just" makes our squashing logic a little smarter and less likely to run into conflicts and fail.
The improvements discussed in https://gitlab.com/gitlab-org/gitlab-ce/issues/54117 would help rebase reliability, but still don't go as far as what Stan is proposing doing here, which is to specifically go over each step in the progress, and ensuring that any potential failures are handled a clear and recoverable way.
Thanks @jwoods06. FYI - Quality will be automating nudge and escalation on issues like this in the future gitlab-org/quality/triage-ops#156. I'll capture this in the link patterns we will detect in addition to ZD links.
For now, I am escalating manually, FYI @timzallmann
James Ramsay (ex-GitLab)changed title from Squash/rebase doesn't work reliably or provide right feedback to Squash doesn't work reliably or provide right feedback
changed title from Squash/rebase doesn't work reliably or provide right feedback to Squash doesn't work reliably or provide right feedback
This issue originally included both rebase and squash in the report. Since improvements to squash have been made with gitaly!1214 (merged), but no progress has been made on the rebase issues, I'm delaying this one release in favor of making progress on rebase reliability https://gitlab.com/gitlab-org/gitlab-ce/issues/54117.
Additionally there are other customers running into rebase problems, so it would be a problem to delay that issue further.
@jramsay What do you think about deprioritizing this until/unless we get reports that the improvements in gitaly!1214 (merged) have not been sufficient?
Since steps have already been taken to address this issue, and it is unclear if further action is required, I'm moving this to the Backlog awaiting further investigation https://gitlab.com/gitlab-org/gitlab-ce/issues/64106 and/or reports from customers that the issue hasn't been resolved.