Skip to content

GitHub Importer: Unable to show diffs after import

Problem

Recently a customer identified an issue when importing diffs from GitHub. The issue is that comments on older diffs (commits) and of highlight (although not confirmed causation) when they contain suggestions - the diffs aren’t visible:

Screen_Shot_2021-09-07_at_2.39.00_PM

In other words, if there is a suggested change on an outdated diff in GH, then GL has problems and says “Unable to Load Diff”. Here is an example in GH: https://github.com/flowower/private-test/pull/2 and here is what it looks like after the import: m_gill/private-test!2 (closed)

Clicking try again in this context gives an error in sentry. The error is an AbstractController::DoubleRenderError, but that is not likely the actual problem.

Some examples that show this problem:

A few initial interesting findings:

  • In the imported MR - the viewer shows that the thread is on an outdated diff even though it’s the latest commit SHA
  • Comments coming from Github are associated with commits (Github doesn’t use push events to generate versions)
  • Comments coming from Github are all associated with the latest commit once imported to GitLab, even though in Github they’re left on previous commits

Previous work in this area:

Additional customer-specific information can be found in this (internal) document

Investigation findings

From the spike issue:

I created my own GH project for testing (https://github.com/patrickbajao/gl-import) as I don't have access to Michelle's project. If someone needs access to this repo, feel free to ping me with you GH handle and I'll invite you.

Imported that project to my GDK so I can investigate the bug a little bit further (e.g. access the DB records).

First issue: Expanding outdated thread can't load the diff

After importing, I was able to replicate the same issue as can be seen on m_gill/private-test!2 (closed): the discussion on old thread cannot load diff when expanded.

Further investigation shows that this is because whenever we expand a discussion, we try to find the note record so we can display the diff by discussion ID. For some reason, the LegacyDiffNote generated after import has no discussion_id in the DB (it's nil). Since it's nil, it can't be found so we see that error state. We need to find the note so we can display the diff stored in that record.

Looking at the Note model (parent of LegacyDiffNote), we require presence of discussion_id which makes me wonder how it was bypassed. Digging further, found out that the DiffNoteImporter code does not set it and we use bulk_insert to create the records. That explains why nil discussion_id was allowed as bulk_insert bypasses validation and callbacks.

@georgekoltsov would you be able to answer if it's intentional that we aren't setting the discussion_id? Asking because I'm not familiar with the import logic. If it's not intentional, can we build a discussion_id for every LegacyDiffNote the importer creates?

Here's a PoC MR I have that does it: !69842 (closed). But I wonder if there are repercussions of doing this from import perspective.

Second issue: Commit shown on outdated thread isn't accurate

Another finding is that the commit shown on the "started a thread on an outdated change in commit" text is not the correct commit. The commit shown is the latest commit which made the diff outdated.

Looking at the same DiffNoteImporter code, we just pass the note.commit_id when creating it. The note in this context is not a LegacyDiffNote yet but a Gitlab::GithubImport::Representation::DiffNote.

Based on my understanding of the flow, this is the commit_id we get from GH's API. Looking at their API though (https://docs.github.com/en/rest/reference/pulls#list-review-comments-on-a-pull-request), there seems to be another property called original_commit_id which is different from commit_id.

Tried it out also in the same PoC MR (!69842 (closed)) and it's now showing the appropriate commit on import.

Another question @georgekoltsov, would you know why we're using commit_id instead of original_commit_id?

Based on my testing, the PoC MR fixes the issues described in this issue. It'll only fix the problem for succeeding imports. Old imports need to be fixed differently by backfilling data which may be more complex. But I do want to confirm with @georgekoltsov first because they may be already looked at before and may cause some issues with the importer.

Proposal

Implement fixes based on the findings/solutions suggested in #340330 (comment 671955596) and the PoC MR: !69842 (closed).

Edited by Haris Delalić