Refactor GitHub importer
Description
Today we have a long running job to import a GitHub project, since Sidekiq does better job in spawning a lot of small jobs rather than have few long running, we need to refactor our current import process that expects that everything happens synchronously to update the import status to be totally async . We also notice in the last few weeks, if GitHub importer fails, we don’t know why. We need to take care of, if we rewrite it. Will help a lot to find a lot of issues.
Related
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/4166#note_11921862
Proposal
Description
We have a plenty of import options:
- GitHub
- BitBucket
- FogBuz
- GitLab
- GitLab Import/Export
- Gitorius.org
- Google Code
- Git repository by URL
But each importer does their job on it's own way, what means that:
- Lack of a standard: for example comments of an issue sometimes are imported as individual notes, sometimes all of them got inserted in the issue description.
- Lack of feedback about the import process: we notice in the last few weeks, if importer fails, in most of the cases we don’t know why.
- High cost of maintenance: it's not easy to keep them updated.
- Impossible to spawn a lot of small jobs instead of keep a few long running.
We want more people on GitLab, a lot of them are migrating from these services to GitLab. So we need to ensure that this process to run smoothly, and what I think that is most important to give the appropriate feedback to the user showing why his project can't be imported.
Proposal
This will be a huge refactoring, I would like to suggest to break in 3 steps:
- Refactor importers
- Refactor UI/UX
- Make the import process async
-
Refactor Importers -
Proposal: -
Wait for replies on my proposal -
Review existing code -
Implementation/Refactoring -
Decouple code to retrieve access tokens from GitHub API wrapper -
Figure out authorization issues
-
-
Create API wrapper -
Create Base API wrapper -
Create GitHub API wrapper
-
-
Create GitHub Mappers -
Labels -
Milestones -
Issues -
Pull Requests -
Comments -
Comments on diff -
User
-
-
Create Workflow -
Create ImportRepository
action -
Create ImportLabels
action -
Create ImportMilestones
action -
Create FetchIssues
action -
Create FetchMergeRequests
action -
Create PersistIssue
action -
Create PersistMergeRequest
action -
Create ApplyLabels
action -
Create ImportComments
action -
Create ImportCommentsOnDiff
action -
Create ImportWiki
action
-
-
Keep track of errors -
API -
Importing Issues, MRs, Comments, etc -
Ensure that all actions will be executed
-
-
Change RepositoryImportWorker
to use the new workflow for GitHub projects -
Verify workflow - Try to import edge cases projects
-
Apply same changes for other importers -
BitBucket -
FogBuz -
GitLab -
Gitorius.org -
Google Code -
Git repository by URL
-
-
Refactor Import Controllers -
GitHub -
BitBucket -
FogBuz -
GitLab -
Gitorius.org -
Google Code
-
-
Add tests -
Clean up old code
-
-
-
Refactor UI -
Show errors when the import process fails -
Improve UX (?)
-
-
Make the import process async
-
Refactor RepositoryImportWorker
to spawn a lot of small jobs rather than have few long running -
API Rate Limit -
Job states and exception handling -
Ensure execution order for jobs, since jobs can have dependencies among other jobs
-