Refactor GitHub importer
Description
Today we have a long running job to import a GitHub project, since Sidekiq does better job in spawning a lot of small jobs rather than have few long running, we need to refactor our current import process that expects that everything happens synchronously to update the import status to be totally async . We also notice in the last few weeks, if GitHub importer fails, we don’t know why. We need to take care of, if we rewrite it. Will help a lot to find a lot of issues.
Related
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/4166#note_11921862
Proposal
Description
We have a plenty of import options:
- GitHub
- BitBucket
- FogBuz
- GitLab
- GitLab Import/Export
- Gitorius.org
- Google Code
- Git repository by URL
But each importer does their job on it's own way, what means that:
- Lack of a standard: for example comments of an issue sometimes are imported as individual notes, sometimes all of them got inserted in the issue description.
- Lack of feedback about the import process: we notice in the last few weeks, if importer fails, in most of the cases we don’t know why.
- High cost of maintenance: it's not easy to keep them updated.
- Impossible to spawn a lot of small jobs instead of keep a few long running.
We want more people on GitLab, a lot of them are migrating from these services to GitLab. So we need to ensure that this process to run smoothly, and what I think that is most important to give the appropriate feedback to the user showing why his project can't be imported.
Proposal
This will be a huge refactoring, I would like to suggest to break in 3 steps:
- Refactor importers
- Refactor UI/UX
- Make the import process async
-
Refactor Importers -
Proposal: -
Wait for replies on my proposal -
Review existing code -
Implementation/Refactoring -
Decouple code to retrieve access tokens from GitHub API wrapper -
Figure out authorization issues
-
-
Create API wrapper -
Create Base API wrapper -
Create GitHub API wrapper
-
-
Create GitHub Mappers -
Labels -
Milestones -
Issues -
Pull Requests -
Comments -
Comments on diff -
User
-
-
Create Workflow -
Create ImportRepositoryaction -
Create ImportLabelsaction -
Create ImportMilestonesaction -
Create FetchIssuesaction -
Create FetchMergeRequestsaction -
Create PersistIssueaction -
Create PersistMergeRequestaction -
Create ApplyLabelsaction -
Create ImportCommentsaction -
Create ImportCommentsOnDiffaction -
Create ImportWikiaction
-
-
Keep track of errors -
API -
Importing Issues, MRs, Comments, etc -
Ensure that all actions will be executed
-
-
Change RepositoryImportWorkerto use the new workflow for GitHub projects -
Verify workflow - Try to import edge cases projects
-
Apply same changes for other importers -
BitBucket -
FogBuz -
GitLab -
Gitorius.org -
Google Code -
Git repository by URL
-
-
Refactor Import Controllers -
GitHub -
BitBucket -
FogBuz -
GitLab -
Gitorius.org -
Google Code
-
-
Add tests -
Clean up old code
-
-
-
Refactor UI -
Show errors when the import process fails -
Improve UX (?)
-
-
Make the import process async
-
Refactor RepositoryImportWorkerto spawn a lot of small jobs rather than have few long running -
API Rate Limit -
Job states and exception handling -
Ensure execution order for jobs, since jobs can have dependencies among other jobs
-