Skip to content

Add rate limiter for PullRequest importer

Vasilii Iakliushin requested to merge 345922_limit_pull_requests_importer into master

What does this MR do and why?

Problem

We don't have a limitation for a number of PullRequest importer jobs. The parallel importer process can generate thousands of them. It puts pressure on other services.

Solution

Limit the number of created jobs by spreading the load for a larger period of time.

New settings max_concurrency and backoff allow to setup the desired size and delay between batches.

The importer was causing problems when we had ~1000 jobs per second. I chose 200 concurrent jobs as a number that is relatively high but well below the critical threshold. It can be adjusted later if necessary.

Screenshots or screen recordings

Screenshot_2022-02-21_at_19.28.19

Sidekiq::ScheduledSet.new.select { |a| a['class'] == 'Gitlab::GithubImport::ImportPullRequestWorker' }.map { |a| a['scheduled_at'] }.map { |time| Time.at(time).to_datetime.to_s }
=> ["2022-02-21T19:40:03+01:00", "2022-02-21T19:39:03+01:00", "2022-02-21T19:38:03+01:00", "2022-02-21T19:37:03+01:00", "2022-02-21T19:36:03+01:00"]

How to set up and validate locally

  1. Enable feature flag
Feature.enable(:limit_parallel_import)
  1. Set max_concurrency to 1 (to simplify the testing)

  2. Restart sidekiq (gdk restart rails-background-jobs)

  3. Open new project page (http://127.0.0.1:3000/projects/new)

  4. Choose Import project -> GitHub

  5. Add a personal token if not set

  6. Select and start importing a project with open Pull Requests

  7. In Rails console execute (need to retry several times)

Sidekiq::ScheduledSet.new.select { |a| a['class'] == 'Gitlab::GithubImport::ImportPullRequestWorker' }.map { |job| Time.at(job['scheduled_at']).to_datetime.to_s }

=> ["2022-02-21T19:40:03+01:00", "2022-02-21T19:39:03+01:00", "2022-02-21T19:38:03+01:00", "2022-02-21T19:37:03+01:00", "2022-02-21T19:36:03+01:00"]

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Vasilii Iakliushin

Merge request reports