GitHub Import - Execute migration in two phases
Context
It's important to note that during the GitHub Import process, certain stages may take longer than others. This is because an API request is needed for each Merge Request (MR) or Issue. The following table illustrates the workers involved in the process and whether they use a single endpoint (which is around 100 times slower), or a collection endpoint (which is faster).
| Stage | Uses single/Collect endpoint | Description about the worker |
|---|---|---|
| ImportRepositoryWorker | This worker imports the repository and wiki, scheduling the next stage when done. | |
| ImportBaseDataWorker | This worker imports base data such as labels, milestones, and releases. | |
| ImportPullRequestsWorker | Collection endpoint |
This worker imports all pull requests. |
| ImportCollaboratorsWorker | Collection endpoint |
This worker imports only direct repository collaborators who are not outside collaborators. |
| ImportPullRequestsMergedByWorker | Single endpoint |
This worker imports the pull requests’ merged-by user information. |
| ImportPullRequestsReviewRequestsWorker | Single endpoint |
This worker imports assigned reviewers of pull requests. |
| ImportPullRequestsReviewsWorker | Single endpoint |
This worker imports reviews of pull requests. |
| ImportIssuesAndDiffNotesWorker | For issues, it uses a collection endpoint For DiffNotes, single endpoint single_endpoint_notes_import is true |
This worker imports all issues, and pull requests diff notes |
| ImportIssueEventsWorker | Single endpoint |
This worker imports all issues and pull requests events. |
| ImportNotesWorker | Single endpoint single_endpoint_notes_import is true |
This worker imports regular comments for issues and pull requests. |
| ImportAttachmentsWorker | Collection endpoint |
This worker imports note attachments linked inside Markdown. |
| ImportProtectedBranchesWorker | Single endpoint, but not relevant |
This worker imports protected branch rules. |
Problem
Large projects with thousands of MRs and Issues take a long time to process the stages using a single endpoint and to get things worse, some of these stages don't migrate essential information and are executed at the beginning of the process.
Idea / Problem
The importer execution process should be split into two phases. In the first phase, only essential data should be imported, allowing users to start using the migrated project as soon as possible. The second phase should be executed after the first and should not block users from using the project.
Phase 1
| Stage | Uses single/Collect endpoint |
|---|---|
| ImportRepositoryWorker | |
| ImportBaseDataWorker | |
| ImportPullRequestsWorker | Collection endpoint |
| ImportCollaboratorsWorker | Collection endpoint |
| ImportIssuesAndDiffNotesWorker | For issues, it uses a collection endpoint For DiffNotes, single endpoint single_endpoint_notes_import is true |
| ImportNotesWorker | Single endpoint single_endpoint_notes_import is true |
| ImportAttachmentsWorker | Collection endpoint |
| ImportProtectedBranchesWorker | Single endpoint, but not relevant |
Phase 2
| Stage | Uses single/Collect endpoint | Notes |
|---|---|---|
| ImportPullRequestsMergedByWorker | Single endpoint |
|
| ImportPullRequestsReviewRequestsWorker | Single endpoint |
|
| ImportPullRequestsReviewsWorker | Single endpoint |
|
| ImportIssueEventsWorker | Single endpoint |
Iterations 👣
-
- Migrating in two phases would be available only via API. Users would need to use the API to know if phase 1 or phase 2 is running.
-
- We would support such migration in the UI. The UI would display an alert that phase 2 is still running.