Skip to content

Run RubySampler and ThreadSampler in Puma primary

Matthias Käppler requested to merge 363833-puma-primary-samplers into master

What does this MR do and why?

Our main source of information for Ruby memory use are the ruby_process_* metrics collected via RubySampler. This includes ruby_process_resident_memory_bytes which represents process RSS.

However, this sampler currently only runs in Puma workers. This means that when taking aggregates in Thanos, such as summing up process RSS, the primary process is not accounted for:

Screenshot_from_2022-05-30_13-16-58

Here, pids 0-5 are all Puma workers.

This provides a misleading picture of actual RSS allocated to Rails processes, especially when cross-referencing this data with memory killer events, since the Puma worker killer reaps workers based on total cluster RSS, not just worker RSS.

This MR makes sure we also run two samplers in the Puma primary process:

  • RubySampler
  • ThreadSampler

This is accomplished by stopping these samplers and re-creating them whenever a worker forks, so that they do not inherit metrics data from the primary accidentally.

Screenshots or screen recordings

We can look at Prometheus to see the new metrics coming from the puma_master now:

Screenshot_from_2022-06-01_15-52-24

Screenshot_from_2022-06-01_15-52-34

How to set up and validate locally

  1. Make sure ApplicationSettings#prometheus_metrics_enabled is true
  2. Start rails-web
  3. Interrogate which threads are running (e.g. via Thread.list) or look at metrics emitted via /-/metrics (you can grep them for puma_*)

I also verified that after kill -TERM-ing a worker, they will fork without problems, and that metrics still reset correctly.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #363833 (closed)

Edited by Matthias Käppler

Merge request reports