Measure Sidekiq enqueue latency for scheduled jobs
What does this MR do and why?
When Sidekiq schedules a job to be run later, it sets the at field
with the desired time and inserts the payload into the scheduled
queue. Sidekiq runs a separate thread that polls this queue via a Redis
zrangebyscore command and inserts jobs that have passed the scheduled
time. Sidekiq uses a random delay that is based on the number of
processes running at a given time.
We observed that this enqueuing of scheduled jobs can add an unexpected, significant delay and increase CPU utilization on Redis as more Sidekiq processes connect. Before we can make any tweaks to this configuration, we should first measure how much delay we have.
This commit adds a enqueue_latency_s for scheduled jobs to record how
long this delay was. In the Sidekiq client middleware, we have access to
the at field before it is deleted by Sidekiq, so we save it as the
scheduled_at field. We subtract the difference between enqueued_at
and scheduled_at and log enqueue_latency_s.
Relates to:
Sample log message
See enqueue_latency_s on the last line:
{
"severity":"INFO",
"time":"2021-10-02T00:21:24.046Z",
"class":"WebHookWorker",
"args":[
"2",
"[FILTERED]",
"merge_request_hooks"
],
"retry":4,
"queue":"web_hook",
"backtrace":true,
"version":0,
"dead":false,
"jid":"d27a5a9fe4f317c1cc78a31c",
"created_at":"2021-10-02T00:21:18.365Z",
"meta.user":"root",
"meta.project":"flightjs/Flight",
"meta.root_namespace":"flightjs",
"meta.subscription_plan":"default",
"meta.caller_id":"Projects::MergeRequests::CreationsController#create",
"meta.remote_ip":"127.0.0.1",
"meta.related_class":"ProjectHook",
"meta.feature_category":"code_review",
"meta.client_id":"user/1",
"correlation_id":"bc55b5581e3d86111b2f235b13258254",
"uber-trace-id":"5d8e685d234755ab:5d8e685d234755ab:0:1",
"worker_data_consistency":"delayed",
"wal_locations":{
"main":"0/A2E30358"
},
"scheduled_at":1633134079.3654552,
"idempotency_key":"resque:gitlab:duplicate:web_hook:3a62dfec64e6e74deddbafd85479d44c5c3f5d1643b627185c17a14c5b82e9ad",
"enqueued_at":"2021-10-02T00:21:24.044Z",
"job_size_bytes":3971,
"pid":49294,
"message":"WebHookWorker JID-d27a5a9fe4f317c1cc78a31c: start",
"job_status":"start",
"scheduling_latency_s":0.001807,
"enqueue_latency_s":4.678918
}
How to set up and validate locally
- Check out this branch.
- Run
gdk restart rails - In the Rails console, you can fire a WebHook:
WebHookWorker.new.perform_in(1.second, WebHook.first.id, {}, :push_hooks)
Alternatively, create a Web Hook with merge request events and create a merge request.
gdk tail rails-background-jobs | grep enqueue_latency
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.