Skip to content

Redefine the Sidekiq Execution & Queueing SLIs to use new counters

With this issue we mean to use the metrics added in #163 (closed)

We currently have a Sidekiq SLI per shard that combines queueing and execution into a single SLI per shard. We will need to break that up so that we monitor queueing and execution into a queueing SLI per shard, and an execution SLI covering all shards.

This means that we'll continue to monitor queueing per shard, it's likely that when a shard gets swamped with jobs, this is most noticeable in the queueing SLI. These SLIs remain owned by infrastructure.

The new execution SLI uses the counters from #1638 (closed) to measure execution performance. I think we should still have per shard alerting, we could use a similar approach as we do for the per-node alerts we have for Gitaly.

Edited by Bob Van Landuyt