UpdateMergeRequestsWorker can be expensive and exceed it's execution SLO thresholds

Resolution

This issue was to address the long-running background job for UpdateMergeRequestsWorker. We have been making incremental improvements since %13.4. These improvements have included adding indexes in !57691 (merged), preloading data in !53802 (merged), making some calls asynchronous in !58542 (merged), reducing gitaly calls in !53536 (merged), reducing loops in the service in !40135 (merged) and a few other smaller changes.

When the issue was created (see image below) the Execution Apdex was often below 99%, we are now averaging 100%. The average execution time is now 2.66s. The max execution time in the past 30 days is 45s, much reduced from 900s.

Original Issue

Some UpdateMergeRequestsWorker can consume up to 900s of CPU in a single execution.

These jobs are marked as urgency=high and have an execution SLO of 10 seconds, but some of these jobs can take up to 20 minutes to run.

https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail?orgId=1&from=now-6h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-shard=urgent-cpu-bound&var-queue=update_merge_requests

search via kibana

Edited Jun 22, 2021 by Marc Shaw