Skip to content

Measure Sidekiq enqueue latency for scheduled jobs

Stan Hu requested to merge sh-sidekiq-enqueue-latency into master

What does this MR do and why?

When Sidekiq schedules a job to be run later, it sets the at field with the desired time and inserts the payload into the scheduled queue. Sidekiq runs a separate thread that polls this queue via a Redis zrangebyscore command and inserts jobs that have passed the scheduled time. Sidekiq uses a random delay that is based on the number of processes running at a given time.

We observed that this enqueuing of scheduled jobs can add an unexpected, significant delay and increase CPU utilization on Redis as more Sidekiq processes connect. Before we can make any tweaks to this configuration, we should first measure how much delay we have.

This commit adds a enqueue_latency_s for scheduled jobs to record how long this delay was. In the Sidekiq client middleware, we have access to the at field before it is deleted by Sidekiq, so we save it as the scheduled_at field. We subtract the difference between enqueued_at and scheduled_at and log enqueue_latency_s.

Relates to:

Sample log message

See enqueue_latency_s on the last line:

{
   "severity":"INFO",
   "time":"2021-10-02T00:21:24.046Z",
   "class":"WebHookWorker",
   "args":[
      "2",
      "[FILTERED]",
      "merge_request_hooks"
   ],
   "retry":4,
   "queue":"web_hook",
   "backtrace":true,
   "version":0,
   "dead":false,
   "jid":"d27a5a9fe4f317c1cc78a31c",
   "created_at":"2021-10-02T00:21:18.365Z",
   "meta.user":"root",
   "meta.project":"flightjs/Flight",
   "meta.root_namespace":"flightjs",
   "meta.subscription_plan":"default",
   "meta.caller_id":"Projects::MergeRequests::CreationsController#create",
   "meta.remote_ip":"127.0.0.1",
   "meta.related_class":"ProjectHook",
   "meta.feature_category":"code_review",
   "meta.client_id":"user/1",
   "correlation_id":"bc55b5581e3d86111b2f235b13258254",
   "uber-trace-id":"5d8e685d234755ab:5d8e685d234755ab:0:1",
   "worker_data_consistency":"delayed",
   "wal_locations":{
      "main":"0/A2E30358"
   },
   "scheduled_at":1633134079.3654552,
   "idempotency_key":"resque:gitlab:duplicate:web_hook:3a62dfec64e6e74deddbafd85479d44c5c3f5d1643b627185c17a14c5b82e9ad",
   "enqueued_at":"2021-10-02T00:21:24.044Z",
   "job_size_bytes":3971,
   "pid":49294,
   "message":"WebHookWorker JID-d27a5a9fe4f317c1cc78a31c: start",
   "job_status":"start",
   "scheduling_latency_s":0.001807,
   "enqueue_latency_s":4.678918
}

How to set up and validate locally

  1. Check out this branch.
  2. Run gdk restart rails
  3. In the Rails console, you can fire a WebHook:
WebHookWorker.new.perform_in(1.second, WebHook.first.id, {}, :push_hooks)

Alternatively, create a Web Hook with merge request events and create a merge request.

  1. gdk tail rails-background-jobs | grep enqueue_latency

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports