2025-09-11: Sidekiq queueing SLO violation on catchall shard
Sidekiq queueing SLO violation on catchall shard (Severity 4 (Low))
Problem: A sudden spike in Sidekiq jobs caused a temporary violation of queueing duration targets on the catchall shard.
Impact: Some Sidekiq jobs on the catchall shard experienced delays in starting, with the apdex dropping to 99.39% over 6 hours. All systems recovered automatically after the spike, and no ongoing issues have been detected.
Causes: A cron job called ResourceAccessTokens::InactiveTokensDeletionCronWorker triggered tens of thousands of MergeRequests::RemoveUserApprovalRulesWorker jobs in a short window, which caused a spike in Sidekiq activity and led to temporary queueing delays.
Response strategy: No manual intervention was needed. The queueing SLO alert resolved itself after the activity spike subsided.
This ticket was created to track INC-3859, by incident.io