Slow execution of Sidekiq job Ci::SyncReportsToReportApprovalRulesWorker (#589686) · Issues · GitLab.org / GitLab

Slow execution of Sidekiq job Ci::SyncReportsToReportApprovalRulesWorker

## Summary We have a regular recurring issue on a dedicated tenant `still_silver_kangaroo` where the execution time of the high urgency cpu-bound Sidekiq job `Ci::SyncReportsToReportApprovalRulesWorker` spikes and delays the whole high urgency cpu-bound queue, impacting the wider instance. There is no infrastructure saturation at these times and it looks like an application side performance issue. Example log from a slow job: <details> <summary>Sanitised log extract</summary> ``` { "correlation_id": "01KH7EW2JPP0YMVRYGA7MPK5M4", "meta.caller_id": "Security::SyncProjectPolicyWorker", "severity": "INFO", "queue": "urgent_cpu_bound", "@timestamp": "2026-02-11T23:05:42.040193372Z", "duration_s": 216.172466, "queue_duration_s": 0.007047, "scheduling_latency_s": 0.007125, "db_duration_s": 35.921989, "db_primary_duration_s": 35.92, "db_replica_duration_s": 0, "db_primary_txn_duration_s": 0, "db_replica_txn_duration_s": 0, "db_primary_txn_max_duration_s": 0, "db_replica_txn_max_duration_s": 0, "db_main_duration_s": 29.346, "db_ci_duration_s": 6.574, "db_main_replica_duration_s": 0, "db_ci_replica_duration_s": 0, "db_main_txn_duration_s": 0.027, "db_ci_txn_duration_s": 0, "db_main_replica_txn_duration_s": 0, "db_ci_replica_txn_duration_s": 0, "db_main_txn_max_duration_s": 0.023, "db_ci_txn_max_duration_s": 0, "db_main_replica_txn_max_duration_s": 0, "db_ci_replica_txn_max_duration_s": 0, "redis_duration_s": 1.233852, "redis_cache_duration_s": 0.32225, "redis_queues_duration_s": 0.004107, "redis_queues_metadata_duration_s": 0.184265, "redis_shared_state_duration_s": 0.72323, "external_http_duration_s": 8.20343656919431, "exclusive_lock_wait_duration_s": 0, "exclusive_lock_hold_duration_s": 0, "redis_calls": 91, "redis_cache_calls": 37, "redis_queues_calls": 13, "redis_queues_metadata_calls": 10, "redis_shared_state_calls": 31, "redis_read_bytes": 378835, "redis_write_bytes": 12089, "redis_cache_read_bytes": 378547, "redis_cache_write_bytes": 2294, "redis_queues_read_bytes": 132, "redis_queues_write_bytes": 6665, "redis_queues_metadata_read_bytes": 14, "redis_queues_metadata_write_bytes": 1657, "redis_shared_state_read_bytes": 142, "redis_shared_state_write_bytes": 1473, "job_size_bytes": 8, "mem_bytes": 703123898, "mem_total_bytes": 2673575498, "cpu_s": 47.416435, "external_http_count": 32 } ``` </details> ## Impact 1. Exceeds the threshold 10s execution time for [high urgency jobs](https://docs.gitlab.com/development/sidekiq/worker_attributes/) - these thresholds are defined in-app. 2. This leads to a backup of the whole high urgency cpu-bound queue and delays of all high urgency cpu-bound jobs on the instance of up to 10 minutes. The acceptable threshold is 10s for all other jobs in this queue also. So this is quite a significant delay. 3. This happens roughly 3 times per week. ## Recommendation We can provide more detailed logs for Dedicated, or even temporary log access to application developers to further investigation. Recommendation to fix is to do one of: 1. Reduce the execution time at these peaks. 2. Break down the slower jobs into multiple smaller jobs. 3. Move the slower jobs to the low urgency queues instead of high-urgency, allowing slower execution time. ## Optimization Plan: `Ci::SyncReportsToReportApprovalRulesWorker` --- **1. Move to low urgency queue** *(Highest impact — addresses the stated incident)* Changed `urgency :high` → `urgency :low`. Job no longer competes in the `urgent_cpu_bound` queue, directly eliminating the queue backup impact. **2. Added `defer_on_database_health_signal`** Defers the job 1 minute when the `approval_merge_request_rules` table is under load, preventing cascading DB pressure during peaks. **3. Rewrote `opened_merge_requests_with_head_sha`** - **Before:** `all_merge_requests.opened.select { |mr| mr.diff_head_pipeline?(self) }` — loaded all MRs into Ruby and filtered in-process. Source of `mem_bytes: 670MB` in the log. - **After:** DB-side `EXISTS` subquery — only fetches matching MRs. Eliminates the memory spike and reduces `db_duration_s`. **4. Memoized `target_branch_report` in `UpdateLicenseApprovalsService`** - **Before:** Fetched the license scan report from object storage once per approval rule per MR. - **After:** `strong_memoize_with(:target_branch_report_by_pipeline, pipeline.id)` caches per pipeline. Addresses most of `external_http_count: 32` and `external_http_duration_s: 8.2s` in the log. **5. Changed `worker_resource_boundary :cpu` → `:unknown`** *(Routing correctness)* The work is primarily DB I/O bound, not CPU-bound. Routes the job to the correct Sidekiq process pool. ### MR link https://gitlab.com/gitlab-org/gitlab/-/merge_requests/231984/diffs

issue