Record sidekiq enqueue counters in Prometheus
Currently in Prometheus we record execution rates, and latencies, but not enqueue rates.
This can make it tricky to diagnose https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8824
We should add metrics for enqueue counters too. These should use the same labels as the sidekiq execution counters.
Once these are in place, we can add key metrics by replacing the TODOS in the sidekiq file in the metrics catalog, here: https://gitlab.com/gitlab-com/runbooks/blob/master/metrics-catalog/services/sidekiq.jsonnet#L40
Having enqueue rate metrics would also help us to determine we should use enqueue rates to determine SLAs (for example, if the application enqueues a large number of jobs, does the queue time SLA increase?). See #86 (comment 266907087) for further discussion on this.
Proposal
- Add a sidekiq client middleware for intercepting new sidekiq job creation calls, incrementing counters accordingly.
- Reuse labels from https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/sidekiq_middleware/metrics.rb in this, which may need some refactoring to reuse the code in a client middleware as well as the existing server middleware.
- Update https://gitlab.com/gitlab-com/runbooks/blob/master/metrics-catalog/services/sidekiq.jsonnet#L40 to include a
rateMetric
on this new counter. - Possibly create a follow-up issue to review if we need other metrics to help diagnose issues.