Importing from GitHub does not migrate all MR comments (GitHub Import tool limitations - Customer blocked)

Summary

Professional Services is working to migrate a customer under a time constraint. The import for this customer has been so far unsuccessful. The timeline is as follows:

  1. May 14th - migration canceled
  2. May 21st target migration Due Date
  3. If May 21st is missed, next opportunity is late August

There are seeing inconsistent behavior when migrating merge request data:

  • General discussion comments are hit or miss, currently not importing at all from our test repo for merge requests, and barely importing from our test repo for issues
  • Git diffs on outdated diffs
  • Diff comment threads are individual comments instead

Issues and Merge Requests are successfully migrated to GitLab from GitHub, but not all comments within them.

Example

For example, this PR on GitHub has 285 comments.

Here is that same MR on the GitLab side after the latest finished import test (2021-05-14):

image

What we are seeing is the DiffNotes are mostly making it over, but some of the regular comments are not showing up at all.

For example,

On GitHub:

image

From one of the attempts with comments:

image

The DiffNotes are at least showing up somewhat. The git diff itself is not consistent, and threads do not appear to be retained, but there is still some record of it which is a great improvement.

What is the current bug behavior?

Not all general discussion comments are being migrated, and comments that happen on diffs show up as individual comments rather than inside threads.

What is the expected correct behavior?

All general discussion comments should be migrated over and comments on diffs should be nested as a thread in order to understand context.

Relevant logs and/or screenshots

Customer migration with massive data size:

  • 17GB repo [Base size, increases with 3k+ branches;
  • Final size with git pack data files is over 40GB for one repo/project]
  • 70,000+ MRs

Previous/Possibly unrelated fixes

  • Fix #1) Adding cache to resume migration after time-out
  • Fix #2) Review importer when the author doesn't exist anymore
  • Fix #3) Argument error contains NULL
  • Fix#4) Rate limit fix
  • Fix#5) Github importer failing with `undefined method 'id' for nil:NilClass
  • Fix#6) GithubImporter: Optimize Pull Request Review Importer

All of these fixes have been applied and tested in GitLab "sandbox" which replicates AVI dev (10-15%). None of these fixes address the issue of MRs (and MR content) being left behind.

Customer documents**

Solution

We are tracking the following fixes as part of the solution to this issue:

Edited by Haris Delalić