Skip to content

2025-08-27: Sidekiq execution error rate SLO violation on quarantine shard

Sidekiq execution error rate SLO violation on quarantine shard (Severity 3 (Medium))

Problem: The sidekiq_execution component on the quarantine shard showed an error rate of 8.943%, surpassing its service level objective for job execution duration.

Impact: Not known yet

Causes: The incident was caused by a change introduced to Canary, but not to Sidekiq, due to Sidekiq deploys being only available for the main stage.

Response strategy: We re-enabled canary to resume promoting to production, applied a temporary alert silence for the quarantine shard, updated a merge request introducing a feature flag for ReactiveCaching, communicated the deployment to the production team, and monitored the deployment until the Sidekiq execution error rate returned to zero.


This ticket was created to track INC-3528, by incident.io 🔥