Skip to content

Run SidekiqExporter only on first worker

Matthias Käppler requested to merge 345794-sidekiq-cluster-leader into master

What does this MR do and why?

When using the SidekiqExporter via gitlab.monitoring.sidekiq_exporter.enable = true and running more than 1 worker in sidekiq-cluster, there is a race condition where all workers try to bind to a port to serve metrics and health-checks. We are currently using a workaround that let's all N-1 workers fail into a rescue clause when failing to allocate that port.

We can address this problem by letting sidekiq-cluster elect a "leader" of all workers, for instance sidekiq_0 (the first worker launched), which will take sole responsibility of the above. All other workers should not attempt to bind ports, serve metrics, or do anything of the sort.

In environments where only 1 worker is used, that worker will lead implicitly.

This makes for a more predictable environment where multiple sidekiq workers are present.

Implementation

I went with the simplest thing I could think of:

  • When running a single worker via bundle exec, this worker will be exporting metrics
  • When running > 1 worker via sidekiq-cluster, only sidekiq_0 will export metrics

This required the least amount of machinery because we already pass around worker IDs through the environment. If there is no worker ID, we know we're not operating in a cluster of processes.

I decided to not put this behind a feature flag because this has caused problems in the past when checked in initializers that run early in the initializer chain (such as 7_prometheus). I think the change is pretty simple and safe.

How to set up and validate locally

Scenario 1: Single worker, no sidekiq-cluster

  1. Run bundle exec sidekiq
  2. curl localhost:<metrics_port>/metrics -- it should serve metrics

Scenario 2: Multiple worker via sidekiq-cluster

  1. Run bin/background_jobs
  2. curl localhost:<metrics_port>/metrics -- it should serve metrics

No worker should ever fail with "can't bind <metrics_port> - already allocated"

Note that <metrics_port> depends on your dev env and local settings. For me using the GCK it is 3807 by default.

Test in review app

Find the sidekiq pod, Exec into it using -it -- bash

Run

git@review-345794-sid-mlslqy-sidekiq-all-in-1-v1-75b9b47b6-pz6m2:/$ curl -s localhost:3807/metrics | wc -l
27178

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #345794 (closed)

Edited by Matthias Käppler

Merge request reports