Measure Sidekiq enqueue latency for scheduled jobs
What does this MR do and why?
When Sidekiq schedules a job to be run later, it sets the at
field
with the desired time and inserts the payload into the scheduled
queue. Sidekiq runs a separate thread that polls this queue via a Redis
zrangebyscore
command and inserts jobs that have passed the scheduled
time. Sidekiq uses a random delay that is based on the number of
processes running at a given time.
We observed that this enqueuing of scheduled jobs can add an unexpected, significant delay and increase CPU utilization on Redis as more Sidekiq processes connect. Before we can make any tweaks to this configuration, we should first measure how much delay we have.
This commit adds a enqueue_latency_s
for scheduled jobs to record how
long this delay was. In the Sidekiq client middleware, we have access to
the at
field before it is deleted by Sidekiq, so we save it as the
scheduled_at
field. We subtract the difference between enqueued_at
and scheduled_at
and log enqueue_latency_s
.
Relates to:
Sample log message
See enqueue_latency_s
on the last line:
{
"severity":"INFO",
"time":"2021-10-02T00:21:24.046Z",
"class":"WebHookWorker",
"args":[
"2",
"[FILTERED]",
"merge_request_hooks"
],
"retry":4,
"queue":"web_hook",
"backtrace":true,
"version":0,
"dead":false,
"jid":"d27a5a9fe4f317c1cc78a31c",
"created_at":"2021-10-02T00:21:18.365Z",
"meta.user":"root",
"meta.project":"flightjs/Flight",
"meta.root_namespace":"flightjs",
"meta.subscription_plan":"default",
"meta.caller_id":"Projects::MergeRequests::CreationsController#create",
"meta.remote_ip":"127.0.0.1",
"meta.related_class":"ProjectHook",
"meta.feature_category":"code_review",
"meta.client_id":"user/1",
"correlation_id":"bc55b5581e3d86111b2f235b13258254",
"uber-trace-id":"5d8e685d234755ab:5d8e685d234755ab:0:1",
"worker_data_consistency":"delayed",
"wal_locations":{
"main":"0/A2E30358"
},
"scheduled_at":1633134079.3654552,
"idempotency_key":"resque:gitlab:duplicate:web_hook:3a62dfec64e6e74deddbafd85479d44c5c3f5d1643b627185c17a14c5b82e9ad",
"enqueued_at":"2021-10-02T00:21:24.044Z",
"job_size_bytes":3971,
"pid":49294,
"message":"WebHookWorker JID-d27a5a9fe4f317c1cc78a31c: start",
"job_status":"start",
"scheduling_latency_s":0.001807,
"enqueue_latency_s":4.678918
}
How to set up and validate locally
- Check out this branch.
- Run
gdk restart rails
- In the Rails console, you can fire a WebHook:
WebHookWorker.new.perform_in(1.second, WebHook.first.id, {}, :push_hooks)
Alternatively, create a Web Hook with merge request events and create a merge request.
gdk tail rails-background-jobs | grep enqueue_latency
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.