Do not count requeued jobs in sidekiq_enqueued_jobs_total
While rolling out load-balancing for Sidekiq workers, we noticed that our Jobs enqueued per service
metric more than doubled for some of these:
We established that this is because due to workers being sticky
or delayed
now, we inject a 1 second delay via perform_in
internally, which leads Sidekiq to re-queue these jobs for future execution. Unfortunately this now means that the above metric does not actually reflect correctly anymore how many jobs we intended to run, since every such job is effectively counted twice.
The dashboard above uses a recording rule, which in turn records the sidekiq_enqueued_jobs_total
metric. We emit this metric from the Sidekiq client middleware. We should find a way to discern whether a job is being executed intentionally vs because of some internal mechanism such as "run later" and only increment this counter in the former case.