Skip to content

Update job coordinator for compatibility with sidekiq sharding

What does this MR do and why?

This MR wraps the pending_jobs and steal methods in job coordinator to be shard-aware.

Shard-awareness is a gitlab.com only concern as it is not a feature that we are rolling out to everyone. It is a horizontal scaling approach for Sidekiq which applies to gitlab.com because of the scale we are at.

See gitlab-com/gl-infra/scalability#2817 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Set-up

  1. Run docker to create an extra redis instance docker run -p 6378:6379 -d redis:6.0-alpine
  2. Update gitlab.yml
## Sidekiq
  sidekiq:
    log_format: json # (default is also supported)
    routing_rules:
      - ["tags=needs_own_queue", null]
      - ["worker_name=BackgroundMigrationWorker", "default", "queues_shard_01"]
      - ["*", "default"]
  1. Update config/redis.yml
➜  gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ cat config/redis.yml
---
development:
  queues_shard_01:
    url: "redis://localhost:6378"
  1. Create a dummy feature flag config file
➜  gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ cat config/feature_flags/ops/sidekiq_route_to_queues_shard_01.yml

---
name: sidekiq_route_to_queues_shard_01
feature_issue_url:
introduced_by_url:
rollout_issue_url:
milestone: '16.9'
group: group::scalability
type: ops
default_enabled: false
  1. Apply this diff to allow dummy jobs to pass gracefully (optional)
diff --git a/lib/gitlab/background_migration/job_coordinator.rb b/lib/gitlab/background_migration/job_coordinator.rb
index 09e2b2a32197..09c55624f93b 100644
--- a/lib/gitlab/background_migration/job_coordinator.rb
+++ b/lib/gitlab/background_migration/job_coordinator.rb
@@ -88,6 +88,7 @@ def steal(steal_class, retry_dead_jobs: false)

               begin
                 perform(migration_class, migration_args) if job.delete
+                puts "performed"
               rescue Exception # rubocop:disable Lint/RescueException
                 worker_class # enqueue this migration again
                   .perform_async(migration_class, migration_args)
@@ -101,7 +102,7 @@ def steal(steal_class, retry_dead_jobs: false)

       def perform(class_name, arguments)
         with_shared_connection do
-          migration_instance_for(class_name).perform(*arguments)
+          # migration_instance_for(class_name).perform(*arguments)
         end

Testing

  1. Open a gdk rails console
Feature.enable(:enable_sidekiq_shard_router)
Feature.enable(:sidekiq_route_to_queues_shard_01)
  1. Schedule a job to steal
Loading development environment (Rails 7.0.8.1)
[1] pry(main)> BackgroundMigrationWorker.perform_in(1.hour,'Foo', 'hello')
=> "24d63d35571d23d59f9d1bb4"
  1. Verify in redis
➜  gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ redis-cli -p 6378 zcard schedule
(integer) 1
  1. Steal the job. The jobs will be fetched from the new Redis instead of gdk's as the job coordinator is shard-aware.
[1] pry(main)> coor = Gitlab::BackgroundMigration::JobCoordinator.for_tracking_database('main')
=> #<Gitlab::BackgroundMigration::JobCoordinator:0x0000000164932eb0 @worker_class=BackgroundMigrationWorker>
[2] pry(main)> out = coor.steal('Foo')
performed
=> [#<Sidekiq::ScheduledSet:0x000000015fa1fc38 @_size=0, @name="schedule">,
 #<Sidekiq::Queue:0x000000015f9f76c0 @name="default", @rname="queue:default">]
  1. Verify that the scheduled job is stolen
➜  gitlab git:(sc1-sidekiq-shard-routing-compat-job-coordinator) ✗ redis-cli -p 6378 zcard schedule
(integer) 0
Edited by Sylvester Chin

Merge request reports