Skip to content

Refactor GitHub importer

Description

Today we have a long running job to import a GitHub project, since Sidekiq does better job in spawning a lot of small jobs rather than have few long running, we need to refactor our current import process that expects that everything happens synchronously to update the import status to be totally async . We also notice in the last few weeks, if GitHub importer fails, we don’t know why. We need to take care of, if we rewrite it. Will help a lot to find a lot of issues.

Related

https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/4166#note_11921862

Proposal

Description

We have a plenty of import options:

  • GitHub
  • BitBucket
  • FogBuz
  • GitLab
  • GitLab Import/Export
  • Gitorius.org
  • Google Code
  • Git repository by URL

But each importer does their job on it's own way, what means that:

  1. Lack of a standard: for example comments of an issue sometimes are imported as individual notes, sometimes all of them got inserted in the issue description.
  2. Lack of feedback about the import process: we notice in the last few weeks, if importer fails, in most of the cases we don’t know why.
  3. High cost of maintenance: it's not easy to keep them updated.
  4. Impossible to spawn a lot of small jobs instead of keep a few long running.

We want more people on GitLab, a lot of them are migrating from these services to GitLab. So we need to ensure that this process to run smoothly, and what I think that is most important to give the appropriate feedback to the user showing why his project can't be imported.

Proposal

This will be a huge refactoring, I would like to suggest to break in 3 steps:

  1. Refactor importers
  2. Refactor UI/UX
  3. Make the import process async
  • Refactor Importers

    • Proposal:
    • Wait for replies on my proposal
    • Review existing code
    • Implementation/Refactoring
      • Decouple code to retrieve access tokens from GitHub API wrapper
        • Figure out authorization issues
      • Create API wrapper
        • Create Base API wrapper
        • Create GitHub API wrapper
      • Create GitHub Mappers
        • Labels
        • Milestones
        • Issues
        • Pull Requests
        • Comments
        • Comments on diff
        • User
      • Create Workflow
        • Create ImportRepository action
        • Create ImportLabels action
        • Create ImportMilestones action
        • Create FetchIssues action
        • Create FetchMergeRequests action
        • Create PersistIssue action
        • Create PersistMergeRequest action
        • Create ApplyLabels action
        • Create ImportComments action
        • Create ImportCommentsOnDiff action
        • Create ImportWiki action
      • Keep track of errors
        • API
        • Importing Issues, MRs, Comments, etc
        • Ensure that all actions will be executed
      • Change RepositoryImportWorker to use the new workflow for GitHub projects
      • Verify workflow
        • Try to import edge cases projects
      • Apply same changes for other importers
        • BitBucket
        • FogBuz
        • GitLab
        • Gitorius.org
        • Google Code
        • Git repository by URL
      • Refactor Import Controllers
        • GitHub
        • BitBucket
        • FogBuz
        • GitLab
        • Gitorius.org
        • Google Code
      • Add tests
      • Clean up old code
  • Refactor UI

    • Show errors when the import process fails
    • Improve UX (?)
  • Make the import process async

    • Refactor RepositoryImportWorker to spawn a lot of small jobs rather than have few long running
    • API Rate Limit
    • Job states and exception handling
    • Ensure execution order for jobs, since jobs can have dependencies among other jobs
Edited by Rémy Coutable