Change stage execution order in GitHub Import
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Context
As mentioned in our GitHub Import development docs, GitHub Import imports resources in stages and as described in #431603 (closed), some stages can take a long time to complete because they use a single endpoint to retrieve the data.
Currently, the stages are executed in the following order:
-
🐰 Stage::ImportRepositoryWorker -
🐰 Stage::ImportBaseDataWorker -
🐰 Stage::ImportPullRequestsWorker -
🐰 Stage::ImportCollaboratorsWorker -
🐢 Stage::ImportPullRequestsMergedByWorker -
🐢 Stage::ImportPullRequestsReviewRequestsWorker -
🐢 Stage::ImportPullRequestsReviewsWorker -
🐢 Stage::ImportIssuesAndDiffNotesWorker -
🐢 Stage::ImportIssueEventsWorker -
🐢 Stage::ImportNotesWorker -
🐰 Stage::ImportAttachmentsWorker -
🐰 Stage::ImportProtectedBranchesWorker -
🐰 Stage::ImportLfsObjectsWorker
Idea
It seems that GitHub Import does not follow the correct order of importance when executing the stages and doesn't take into account how long each stage can take to execute. Migrating important items first can be beneficial to users in case a migration fails in the middle. For instance, the user can decide to use the project without the missing resources if they aren't too important to them.
Proposed solution
Change the execution order of the stages.
-
🐰 Stage::ImportRepositoryWorker -
🐰 Stage::ImportBaseDataWorker -
🐰 Stage::ImportProtectedBranchesWorker -
🐰 Stage::ImportPullRequestsWorker -
🐰 Stage::ImportIssues (Split ImportIssuesAndDiffNotesWorker into two stages) -
🐰 Stage::ImportCollaboratorsWorker -
🐰 Stage::ImportLfsObjectsWorker -
🐢 Stage::DiffNotesWorker (Split ImportIssuesAndDiffNotesWorker into two stages) -
🐢 Stage::ImportNotesWorker -
🐰 Stage::ImportAttachmentsWorker -
🐢 Stage::ImportPullRequestsMergedByWorker -
🐢 Stage::ImportPullRequestsReviewRequestsWorker -
🐢 Stage::ImportPullRequestsReviewsWorker -
🐢 Stage::ImportIssueEventsWorker