Investigate degraded performance of AuthorizedProjectsWorker
Summary
The performance of the AuthorizedProjectsWorker
appears to have degraded in recent weeks. This can be seen here in our apdex measurements for the sidekiq_execution
SLI on the urgent-authorized-worker
shard specifically - you can see the apdex spends noticeably less time at 100% after August 22nd.
Impact
This has resulted in the sidekiq_execution
SLI of the urgent-authorized-projects
Sidekiq shard to repeatedly violate its SLO and spam the EOC with alerts - this particular SLI/shard has paged the EOC 23 times since August 22nd. In each case there was no action the EOC could take except wait for the jobs to complete.
The reliability issue is here: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24341 it was first raised here: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24230#note_1522722386
Recommendation
Investigate the cause of the performance degradation in the AuthorizedProjectsWorker
and rectify.
Verification
Monitor apdex after the fix is deployed - it should be more stable and not page the EOC anymore.