Importing from GitHub does not migrate all MR comments (GitHub Import tool limitations - Customer blocked)
Summary
Professional Services is working to migrate a customer under a time constraint. The import for this customer has been so far unsuccessful. The timeline is as follows:
- May 14th - migration canceled
- May 21st target migration Due Date
- If May 21st is missed, next opportunity is late August
There are seeing inconsistent behavior when migrating merge request data:
- General discussion comments are hit or miss, currently not importing at all from our test repo for merge requests, and barely importing from our test repo for issues
- Git diffs on outdated diffs
- Diff comment threads are individual comments instead
Issues and Merge Requests are successfully migrated to GitLab from GitHub, but not all comments within them.
Example
For example, this PR on GitHub has 285 comments.
Here is that same MR on the GitLab side after the latest finished import test (2021-05-14):
What we are seeing is the DiffNotes are mostly making it over, but some of the regular comments are not showing up at all.
For example,
On GitHub:
From one of the attempts with comments:
The DiffNotes are at least showing up somewhat. The git diff itself is not consistent, and threads do not appear to be retained, but there is still some record of it which is a great improvement.
What is the current bug behavior?
Not all general discussion comments are being migrated, and comments that happen on diffs show up as individual comments rather than inside threads.
What is the expected correct behavior?
All general discussion comments should be migrated over and comments on diffs should be nested as a thread in order to understand context.
Relevant logs and/or screenshots
Customer migration with massive data size:
- 17GB repo [Base size, increases with 3k+ branches;
- Final size with git pack data files is over 40GB for one repo/project]
- 70,000+ MRs
Previous/Possibly unrelated fixes
- Fix #1) Adding cache to resume migration after time-out
- Fix #2) Review importer when the author doesn't exist anymore
- Fix #3) Argument error contains NULL
- Fix#4) Rate limit fix
- Fix#5) Github importer failing with `undefined method 'id' for nil:NilClass
- Fix#6) GithubImporter: Optimize Pull Request Review Importer
All of these fixes have been applied and tested in GitLab "sandbox" which replicates AVI dev (10-15%). None of these fixes address the issue of MRs (and MR content) being left behind.
Customer documents**
- Internal Escalation document
- PS Project Definition doc (see issues tab)
Solution
We are tracking the following fixes as part of the solution to this issue:
-
1. Adding cache to resume migration after time-out. MR: !60668 (merged) -
2. Review importer when the author doesn't exist anymore. MR: !61257 (merged) -
3. Argument error contains NULL. MR: !61480 (merged) -
4. Rate limit fix. Issue: #329552 (closed) -
5. Github importer failing with undefined method 'id' for nil:NilClass
. Issue: #330294 (closed) -
6. Optimize GH Importer pagination when importing Pull Requests. Issue: #331315 (closed) -
7. Intermittent issue caused by network/infra. Issue: #332630 (closed) -
8. Mark relation as imported after the importer runs, instead of when import is scheduled. Issue: #333246 (closed) -
9. Infra: Unexpected interruption of big project imports. Issue: #332616 (closed) infradev reliability -
10. Fetch 1 comment at time to work around GH API limitations Issue: #332630 (closed)