Sidekiq metrics exporter metrics are broken
It looks like !81983 (merged) broke metrics we export about the Sidekiq metrics exporter process:
I think the problem here is:
- sidekiq-cluster spawns a metrics server process, which wipes the
metrics_dir
, starts its own Ruby sampler, then starts one or more workers - when the first worker boots, it wipes metrics dir again in
7_prometheus
The problem is that Sidekiq executes things in a different order than Puma, because sidekiq-cluster is not a Rails process, and it does not know when workers are fully started or how to communicate with them at all (outside of sending signals.)
To fix this, we should consider:
- Not wiping metrics in the initializer for Sidekiq since that again arbitrarily executes logic in only one Sidekiq process on behalf of all others, which leads to race conditions. It is OK for Puma because with Puma, there is a dedicated primary that starts before any workers do.
- Moving this logic into the sidekiq-cluster parent script (
cli.rb
), before either the metrics server or any worker are started.
Edited by Matthias Käppler