Code Review Spike: Importing Diffs from GH
Problem
Recently a customer identified an issue when importing diffs from GitHub. The issue is that comments on older diffs (commits) and of highlight (although not confirmed causation) when they contain suggestions - the diffs aren’t visible:
In other words, if there is a suggested change on an outdated diff in GH, then GL has problems and says “Unable to Load Diff”. Here is an example in GH: https://github.com/flowower/private-test/pull/2 and here is what it looks like after the import: m_gill/private-test!2
Clicking try again
in this context gives an error in sentry. The error is an AbstractController::DoubleRenderError
, but that is not likely the actual problem.
Some examples that show this problem:
- GitHub MR 1 - https://github.com/flowower/private-test/pull/1
- GitLab Import of MR 1 - m_gill/private-test!1
- GitHub MR 2 - https://github.com/flowower/private-test/pull/2
- GitLab Import of MR 2 - m_gill/private-test!2
A few initial interesting findings:
- In the imported MR - the viewer shows that the thread is on an
outdated diff
even though it’s the latest commit SHA - Comments coming from Github are associated with commits (Github doesn’t use push events to generate versions)
- Comments coming from Github are all associated with the latest commit once imported to GitLab, even though in Github they’re left on previous commits
Previous work in this area:
- Discussions around positioning: #296551 (comment 593031969)
- !18510 (merged)
- #27715 (closed)
- PoC - diff_refs for legacy: !62557 (closed)
Additional customer-specific information can be found in this (internal) document
Spike Issue Expectations
For this issue, we would like to understand the scope of the problem. Once we understand that, we can create new issues to implement the work. Some of the questions we are looking to answer are:
- Is there a viable path forward on the import of the data to have diffs visible and associated with the correct commit?
- What is the level of effort required to achieve this? What is the timeline? Which teams are involve? Code Review? Import?
- What are the unknowns/concerns of this effort?
- Would there be impacts to the length of time imports take?
- Is it possible to have a script apply the fix/update the data after the import?
- Can this be fixed by changing the diff presentation logic (frontend change)?