Skip to content

Fix: Sidekiq workers delete each other's metrics

Matthias Käppler requested to merge 336311-fix-missing-sidekiq-metrics into master

What does this MR do?

Fixes #336311 (closed)

When we moved the logic that wipes the Prometheus metrics dir out of the Rackup file and into the initializer, all Sidekiq workers would call this and potentially enter a race condition where they deleted each other's database files.

Since config.ru is only executed by Puma, and since this call is guarded by only running for the primary, this should not happen anymore.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Tested with gitlab-ee:52edbc6e312c826c5b47f5cf65d60adee770b031 and verified that all prometheus db files remained intact across restarts:

root@local:/# ll /dev/shm/gitlab/sidekiq
total 10856
drwx------ 2 git  root     280 Jul 20 12:53 ./
drwxr-xr-x 4 root root      80 Jul 20 12:33 ../
-rw-r--r-- 1 git  git     4096 Jul 20 12:53 counter_sidekiq_0-0.db
-rw-r--r-- 1 git  git     4096 Jul 20 12:53 counter_sidekiq_1-0.db
-rw-r--r-- 1 git  git     8192 Jul 20 12:54 counter_sidekiq_2-0.db
-rw-r--r-- 1 git  git     8192 Jul 20 12:54 gauge_all_sidekiq_0-0.db
-rw-r--r-- 1 git  git     8192 Jul 20 12:54 gauge_all_sidekiq_1-0.db
-rw-r--r-- 1 git  git     8192 Jul 20 12:54 gauge_all_sidekiq_2-0.db
-rw-r--r-- 1 git  git     4096 Jul 20 12:53 gauge_max_sidekiq_0-0.db
-rw-r--r-- 1 git  git     4096 Jul 20 12:53 gauge_max_sidekiq_1-0.db
-rw-r--r-- 1 git  git     4096 Jul 20 12:53 gauge_max_sidekiq_2-0.db
-rw-r--r-- 1 git  git  4194304 Jul 20 12:54 histogram_sidekiq_0-0.db
-rw-r--r-- 1 git  git  4194304 Jul 20 12:54 histogram_sidekiq_1-0.db
-rw-r--r-- 1 git  git  4194304 Jul 20 12:54 histogram_sidekiq_2-0.db
Edited by Matthias Käppler

Merge request reports