fix: Defer Gitlab::Instrumentation::ConnectionPool until after reinitalize_on_pid_change
What does this MR do and why?
While looking at puma worker metrics, some of them appear to be missing.
In particular gauges from RubySampler
, e.g. ruby_file_descriptors
(thanos).
We only have values for the pid puma_master
, none for puma_0
,
puma_1
, etc.
The corresponding prometheus mmap file prefix is gauge_all
. It appears
we are not properly resetting MmapedValue after forking the puma
worker. Thus, workers continue writing to the mmapped file belonging to
the puma_master
, and likely competing / continuously overwriting each
other.
Starting with Ruby 3.1, connection_pool performs logic directly on fork:
This is triggered by puma here:
If we instrument before the fork, then instrumentation logic will get called on fork, which causes metrics to be re-initialized with the wrong pid.
Metrics continue to be attributed to the puma_master
, because we do not
yet have the process name set:
- https://github.com/puma/puma/blob/v6.4.0/lib/puma/cluster.rb#L206
- https://github.com/puma/puma/blob/v6.4.0/lib/puma/cluster/worker.rb#L33
This process name is needed for proper attribution by PidProvider
.
By deferring the instrumentation, we ensure no prometheus metrics are
touched until process name is set, and we can explicitly reinitialize
via reinitialize_on_pid_change
.
This allows the gauge metrics to be correctly attributed to puma_1
,
puma_2
, etc.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Start rails in gdk. Wait a minute for PumaSampler to kick in.
- Check
tmp/prometheus_multiproc_dir/puma
, there is nogauge_all_puma_0-0.db
. - Check
curl -s 127.0.0.1:3000/-/metrics | grep '^ruby_file_descriptors'
, there is onlypuma_master
. - Restart rails in gdk. Wait a minute for PumaSampler to kick in.
- Check
tmp/prometheus_multiproc_dir/puma
, we now havegauge_all_puma_0-0.db
. - Check
curl -s 127.0.0.1:3000/-/metrics | grep '^ruby_file_descriptors'
, we now have pidspuma_0
,puma_1
,puma_master
.
refs #367527 (closed)