GitHub Import - Pagination issue
When importing NodeJS project from GitHub on my local environment, the Gitlab::GithubImport::Importer::IssuesImporter failed with the following error:
GET https://api.github.com/repos/nodejs/node/issues?direction=asc&page=459&per_page=100&sort=created&state=all: 422 - Pagination with the page parameter is not supported for large datasets, please use cursor based pagination (after/before)
It appears that GitHub no longer supports manual pagination, and requests may fail when importing projects with a large number of issues or pull requests. In other words, we can no longer paginate the API results simply by increasing the page query string; instead, we should paginate the results using the links provided in the response header, which include a cursor.
Note that GitHub Import already utilizes the pagination links supplied in the header, however, when the stage worker is interrupted due to API rate limit and later resumed, GitHub Import builds the Rest API by providing the last processed page and does not include the cursor, which can result in the error.
Proposed solution
To fix the problem, we will need to change how GitHub resumes the stage workers. Instead of caching the last processed page, we will need to cache the pagination links provided in the header and use them when resuming the worker.
Bitbucket Import already implements such pagination, and we can use a similar approach.