GitHub Import - Execute migration in two phases

Context

It's important to note that during the GitHub Import process, certain stages may take longer than others. This is because an API request is needed for each Merge Request (MR) or Issue. The following table illustrates the workers involved in the process and whether they use a single endpoint (which is around 100 times slower), or a collection endpoint (which is faster).

Stage Uses single/Collect endpoint Description about the worker
ImportRepositoryWorker This worker imports the repository and wiki, scheduling the next stage when done.
ImportBaseDataWorker This worker imports base data such as labels, milestones, and releases.
ImportPullRequestsWorker Collection endpoint This worker imports all pull requests.
ImportCollaboratorsWorker Collection endpoint This worker imports only direct repository collaborators who are not outside collaborators.
ImportPullRequestsMergedByWorker Single endpoint 🔴 This worker imports the pull requests’ merged-by user information.
ImportPullRequestsReviewRequestsWorker Single endpoint 🔴 This worker imports assigned reviewers of pull requests.
ImportPullRequestsReviewsWorker Single endpoint 🔴 This worker imports reviews of pull requests.
ImportIssuesAndDiffNotesWorker For issues, it uses a collection endpoint
For DiffNotes, single endpoint 🔴 when single_endpoint_notes_import is true
This worker imports all issues, and pull requests diff notes
ImportIssueEventsWorker Single endpoint 🔴 This worker imports all issues and pull requests events.
ImportNotesWorker Single endpoint 🔴 when option single_endpoint_notes_import is true This worker imports regular comments for issues and pull requests.
ImportAttachmentsWorker Collection endpoint This worker imports note attachments linked inside Markdown.
ImportProtectedBranchesWorker Single endpoint, but not relevant This worker imports protected branch rules.

Problem

Large projects with thousands of MRs and Issues take a long time to process the stages using a single endpoint and to get things worse, some of these stages don't migrate essential information and are executed at the beginning of the process.

Idea / Problem

The importer execution process should be split into two phases. In the first phase, only essential data should be imported, allowing users to start using the migrated project as soon as possible. The second phase should be executed after the first and should not block users from using the project.

Phase 1

Stage Uses single/Collect endpoint
ImportRepositoryWorker
ImportBaseDataWorker
ImportPullRequestsWorker Collection endpoint
ImportCollaboratorsWorker Collection endpoint
ImportIssuesAndDiffNotesWorker For issues, it uses a collection endpoint
For DiffNotes, single endpoint 🔴 when single_endpoint_notes_import is true
ImportNotesWorker Single endpoint 🔴 when option single_endpoint_notes_import is true
ImportAttachmentsWorker Collection endpoint
ImportProtectedBranchesWorker Single endpoint, but not relevant

Phase 2

Stage Uses single/Collect endpoint Notes
ImportPullRequestsMergedByWorker Single endpoint 🔴
ImportPullRequestsReviewRequestsWorker Single endpoint 🔴
ImportPullRequestsReviewsWorker Single endpoint 🔴
ImportIssueEventsWorker Single endpoint 🔴

Iterations 👣

    1. Migrating in two phases would be available only via API. Users would need to use the API to know if phase 1 or phase 2 is running.
    1. We would support such migration in the UI. The UI would display an alert that phase 2 is still running.
Edited by Rodrigo Tomonari