Do not count requeued jobs in sidekiq_enqueued_jobs_total

While rolling out load-balancing for Sidekiq workers, we noticed that our Jobs enqueued per service metric more than doubled for some of these:

https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail?viewPanel=17&orgId=1&from=now-12h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-queue=pipeline_cache:expire_pipeline_cache&refresh=30s

We established that this is because due to workers being sticky or delayed now, we inject a 1 second delay via perform_in internally, which leads Sidekiq to re-queue these jobs for future execution. Unfortunately this now means that the above metric does not actually reflect correctly anymore how many jobs we intended to run, since every such job is effectively counted twice.

The dashboard above uses a recording rule, which in turn records the sidekiq_enqueued_jobs_total metric. We emit this metric from the Sidekiq client middleware. We should find a way to discern whether a job is being executed intentionally vs because of some internal mechanism such as "run later" and only increment this counter in the former case.

Edited Jun 15, 2021 by Matthias Käppler