Make repository pull mirroring not depend on sidekiq queue sizes (!79097) · Merge requests · GitLab.org / GitLab

Quang-Minh Nguyen requested to merge qmnguyen0711/340630-make-repository-pull-mirroring-not-depend-on-sidekiq-queue-sizes into master Jan 25, 2022

What does this MR do and why?

For #340630 (closed)

At the time of writing, the UpdateAllMirrorsWorker uses the queue size of the ProjectImportScheduleWorker before possibly rescheduling itself. This dependency prevents us from moving to queue-per-shard completely. As a part of an initiative to deprecate queue selector, this MR is to make UpdateAllMirrorsWorker not depend on the queue size of the other worker.

In another discussion (gitlab-com/gl-infra&596 (comment 815940895)), we talked about some simpler solutions to accomplish the goal of this issue a bit earlier. Major refactoring/rewriting core mirroring logic is a too big scope for the Scalability team resource at the moment. We are also not confident about the familiarity with the flow. There were two solutions in the discussion:

Loop and count the occurrences of ProjectImportScheduleWorker in its queue. This approach was applied in the background migration. The RPS of UpdateAllMirrorsWorker is too high, unfortunately. We can't afford looping through all the jobs of the queue every single second.
Alternatively, we can count the execution of ProjectImportScheduledWorker using LimitedCapacity::JobTracker. In the latest refactoring effort, that utility becomes independent of the LimitedCapacity itself. It does many things, but we only need job registering when a job starts, and removing when it finishes. Worker size query is now down to O(1). The code change is also minimal. The tradeoff is some overhead when executing the job, but I think it's a fair tradeoff.

This MR makes a significant semantic change: UpdateAllMirrorsWorker now waits until all ProjectImportScheduleWorker jobs finish. At the moment it is waiting for those jobs are picked up. Fortunately, ProjectImportScheduleWorker is just a really thin wrapper for real repository import. From the monitoring dashboard, the average execution time of jobs from this worker is under 1 second. It means this semantic change won't yield any significant impact.

Screenshots or screen recordings

This MR doesn't yield any user-facing change.

How to set up and validate locally

Because the ProjectScheduleImportWorker runs really fast, it's quite hard to simulate a job execution. Therefore, to test on local environment, we must add a significant sleep amount to that worker:

diff --git a/ee/app/workers/project_import_schedule_worker.rb b/ee/app/workers/project_import_schedule_worker.rb
index 9d8d106168e..43e4ab075ab 100644
--- a/ee/app/workers/project_import_schedule_worker.rb
+++ b/ee/app/workers/project_import_schedule_worker.rb
@@ -26,6 +26,8 @@ class ProjectImportScheduleWorker

   def perform(project_id)
     job_tracker.register(jid, capacity)
+
+    sleep 20

     return if Gitlab::Database.read_only?

Restart rails console and rails background jobs
Setup a lot of mirror projects
Run the following command in local environment

pry(main)> UpdateAllMirrorsWorker.new.perform

It's likely that the above command is blocked for 20 seconds. It's a good sign that the worker is waiting for the jobs to complete
Access sidekiq dashboard, a bunch of ProjectImportScheduleWorker jobs are being executed.
Open another console. Run the following command. The result indicates that there are 30 running jobs.

pry(main)> LimitedCapacity::JobTracker.new(ProjectImportScheduleWorker.name).count
=> 30

Wait for 20 seconds. Try the above command again, we'll find the result returns 0, indicating all the jobs are being executed. The Sidekiq dashboard stated the same.

pry(main)> LimitedCapacity::JobTracker.new(ProjectImportScheduleWorker.name).count
=> 0

Check the logs, or Sidekiq dashboard, we can see that UpdateAllMirrorWorker is re-scheduled. The first console is unlocked.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Jan 26, 2022 by Quang-Minh Nguyen

Make repository pull mirroring not depend on sidekiq queue sizes

What does this MR do and why?

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports