Skip to content

GithubImporter: Optimize Pull Request Review Importer [RUN ALL RSPEC] [RUN AS-IF-FOSS]

What does this MR do?

Problem

The Github API does not provide a way to fetch all the pull requests reviews of a project (repo), like it provides for comments, instead we have to fetch the reviews by Pull Request.

For this reason, the Gitlab::GithubImport::Importer::PullRequestsReviewsImporter¹ have to iterate over the imported pull requests and for each one do request the reviews, which might be more than one page.

If the importer hits a rate limit, the process restarts, and the imported pull requests are skipped², but the importer goes over all the review pages again.

In other words, for some projects with large number of pull requests and large number of reviews per pull request, we might end up with duplicated reviews and unnecessary API requests, which would lead to longer importing times.

Proposed solution

  • To avoid duplicated comments, besides caching the Pull Requests ids, also cache the review ids and skip the already processed ones.
  • To avoid unnecessary API requests, use the PageCounter to only request pages that weren't yet imported.

Reference

  1. !48632 (merged) - First version of the Pull Requests reviews importer
  2. !60668 (merged) - Skip already imported Pull Requests reviews
  3. Related to: #330783 (closed)

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Danger bot

Merge request reports