Slow execution of Sidekiq job Ci::SyncReportsToReportApprovalRulesWorker
Summary
We have a regular recurring issue on a dedicated tenant still_silver_kangaroo where the execution time of the high urgency cpu-bound Sidekiq job Ci::SyncReportsToReportApprovalRulesWorker spikes and delays the whole high urgency cpu-bound queue, impacting the wider instance. There is no infrastructure saturation at these times and it looks like an application side performance issue.
Example log from a slow job:
Sanitised log extract
{
"correlation_id": "01KH7EW2JPP0YMVRYGA7MPK5M4",
"meta.caller_id": "Security::SyncProjectPolicyWorker",
"severity": "INFO",
"queue": "urgent_cpu_bound",
"@timestamp": "2026-02-11T23:05:42.040193372Z",
"duration_s": 216.172466,
"queue_duration_s": 0.007047,
"scheduling_latency_s": 0.007125,
"db_duration_s": 35.921989,
"db_primary_duration_s": 35.92,
"db_replica_duration_s": 0,
"db_primary_txn_duration_s": 0,
"db_replica_txn_duration_s": 0,
"db_primary_txn_max_duration_s": 0,
"db_replica_txn_max_duration_s": 0,
"db_main_duration_s": 29.346,
"db_ci_duration_s": 6.574,
"db_main_replica_duration_s": 0,
"db_ci_replica_duration_s": 0,
"db_main_txn_duration_s": 0.027,
"db_ci_txn_duration_s": 0,
"db_main_replica_txn_duration_s": 0,
"db_ci_replica_txn_duration_s": 0,
"db_main_txn_max_duration_s": 0.023,
"db_ci_txn_max_duration_s": 0,
"db_main_replica_txn_max_duration_s": 0,
"db_ci_replica_txn_max_duration_s": 0,
"redis_duration_s": 1.233852,
"redis_cache_duration_s": 0.32225,
"redis_queues_duration_s": 0.004107,
"redis_queues_metadata_duration_s": 0.184265,
"redis_shared_state_duration_s": 0.72323,
"external_http_duration_s": 8.20343656919431,
"exclusive_lock_wait_duration_s": 0,
"exclusive_lock_hold_duration_s": 0,
"redis_calls": 91,
"redis_cache_calls": 37,
"redis_queues_calls": 13,
"redis_queues_metadata_calls": 10,
"redis_shared_state_calls": 31,
"redis_read_bytes": 378835,
"redis_write_bytes": 12089,
"redis_cache_read_bytes": 378547,
"redis_cache_write_bytes": 2294,
"redis_queues_read_bytes": 132,
"redis_queues_write_bytes": 6665,
"redis_queues_metadata_read_bytes": 14,
"redis_queues_metadata_write_bytes": 1657,
"redis_shared_state_read_bytes": 142,
"redis_shared_state_write_bytes": 1473,
"job_size_bytes": 8,
"mem_bytes": 703123898,
"mem_total_bytes": 2673575498,
"cpu_s": 47.416435,
"external_http_count": 32
}
Impact
- Exceeds the threshold 10s execution time for high urgency jobs - these thresholds are defined in-app.
- This leads to a backup of the whole high urgency cpu-bound queue and delays of all high urgency cpu-bound jobs on the instance of up to 10 minutes. The acceptable threshold is 10s for all other jobs in this queue also. So this is quite a significant delay.
- This happens roughly 3 times per week.
Recommendation
We can provide more detailed logs for Dedicated, or even temporary log access to application developers to further investigation.
Recommendation to fix is to do one of:
- Reduce the execution time at these peaks.
- Break down the slower jobs into multiple smaller jobs.
- Move the slower jobs to the low urgency queues instead of high-urgency, allowing slower execution time.