Audit sidekiq callsites in monolith for shard-aware requirements

Overview

The approach for application-side queue routing to redis instances will heavily involve the use of Sidekiq::Client.via which is able to safely change the Redis pool when using

  1. perform_async/perform_in
  2. Sidekiq APIs

For (1), we can patch perform_ variants the same way we forbid sidekiq in transactions.

For (2), call sites which needs to be updated for shard-awareness:

  • API::SidekiqMetrics
  • Gitlab::SidekiqQueue
  • Gitlab::BackgroundMigration::JobCoordinator
  • Gitlab::Database::Migrations::SidekiqHelpers
  • Rake task gitlab:snippets:migration_status
  • Sidekiq::Web

Gitlab::HashedStorage::Migrator is no longer relevant as it has been removed in commit aa06a97aa7215a291104efcda96e666b6d143a87, verify using git show aa06a97aa7215a291104efcda96e666b6d143a87)


Exceptions to Sidekiq::Client.via

What Sidekiq::Client.via does not handle are Sidekiq's internal use of Sidekiq::Client.new(...).push in:

Once a Sidekiq::Client instance created, the @redis_pool it uses does not change. In raw_push, we se that @redis_pool is used (https://github.com/sidekiq/sidekiq/blob/v7.1.6/lib/sidekiq/client.rb#L225).

The work around is fairly simple. We can pass our own scheduling enqueuer through config[:scheduled_enq] which selects the desired Sidekiq::Client instance depending on the job that is popped from the sorted set. Draft at https://gitlab.com/gitlab-org/gitlab/-/blob/20f49120c9dadb117963f558a250daa61271684e/lib/gitlab/sidekiq_sharding/scheduled_enq.rb


Crons

Cronjobs are not an issue since sidekiq-cron calls .perform_async on the worker_klass.

Click to expand

We store klass in the cron jobs hash, so sidekiq-cron will call enqueue_sidekiq_worker that does klass_const.set(queue: queue_name_with_prefix).perform_async(*enqueue_args)

redis /Users/sylvesterchin/work/gitlab-development-kit/redis/redis.socket[1]> hgetall cron_job:incident_sla_exceeded_check_worker
 1) "last_enqueue_time"
 2) "2024-02-07 05:58:04 +0000"
 3) "klass"
 4) "IncidentManagement::IncidentSlaExceededCheckWorker"
 5) "active_job"
 6) "0"

Polling is done using Sidekiq.redis (https://github.com/sidekiq-cron/sidekiq-cron/blob/master/lib/sidekiq/cron/job.rb#L26) but we will likely disable cron polling on any shard that is not using redis-sidekiq. I feel that distributed cron job polling is not worth solving at this stage since cron polls does not scale the same way as regular jobs.

Edited by Sylvester Chin