GitHub Importer: allow importer to use a pool of access tokens via API
Release notes
GitHub Importer uses a single access token when performing imports of projects from GitHub to GitLab. It is typically rate limited to 5000 requests per hour.
This can significantly reduce the speed of the importer when:
- Importing multiple small to medium sized projects where when rate limited, imports will wait for rate limit to be reset and drain the token concurrently
- Importing a single massive project with tons of data
Rate limiting happens quite fast and importer just ends up doing nothing until 1 hour passed and it's reset again.
Problem Statement
GitHub Importer uses a single access token when performing imports of projects from GitHub to GitLab. It is typically rate limited to 5000 requests per hour.
This can significantly reduce the speed of the importer when:
- Importing multiple small to medium sized projects where when rate limited, imports will wait for rate limit to be reset and drain the token concurrently
- Importing a single massive project with tons of data
Rate limiting happens quite fast and importer just ends up doing nothing until 1 hour passed and it's reset again.
Proposed Solution
Allow passing a list of tokens via API to GitHub Importer so that it can rotate them efficiently when rate limited.
- Tokens provided cannot be from the same account, as they all share one rate limit
- Tokens have to have the same privileges to the repositories imported
- Useful to have behind a feature flag, for PS engagements, in situations where a large project has to be imported within a reasonable amount of time
Possible issues
- Rotating among different user tokens could increase achievable througput, but beyond a reasonable limits it could be considered an abuse of GH API.
- Possible bottle neck could become the GitLab instance that we want to import to.
See related conversation here and conversations copied from duplicate issue in the comment.