Run RubySampler and ThreadSampler in Puma primary (!89039) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge 363833-puma-primary-samplers into master Jun 01, 2022

What does this MR do and why?

Our main source of information for Ruby memory use are the ruby_process_* metrics collected via RubySampler. This includes ruby_process_resident_memory_bytes which represents process RSS.

However, this sampler currently only runs in Puma workers. This means that when taking aggregates in Thanos, such as summing up process RSS, the primary process is not accounted for:

Here, pids 0-5 are all Puma workers.

This provides a misleading picture of actual RSS allocated to Rails processes, especially when cross-referencing this data with memory killer events, since the Puma worker killer reaps workers based on total cluster RSS, not just worker RSS.

This MR makes sure we also run two samplers in the Puma primary process:

RubySampler
ThreadSampler

This is accomplished by stopping these samplers and re-creating them whenever a worker forks, so that they do not inherit metrics data from the primary accidentally.

Screenshots or screen recordings

We can look at Prometheus to see the new metrics coming from the puma_master now:

How to set up and validate locally

Make sure ApplicationSettings#prometheus_metrics_enabled is true
Start rails-web
Interrogate which threads are running (e.g. via Thread.list) or look at metrics emitted via /-/metrics (you can grep them for puma_*)

I also verified that after kill -TERM-ing a worker, they will fork without problems, and that metrics still reset correctly.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #363833 (closed)

Edited Jun 01, 2022 by Matthias Käppler

Run RubySampler and ThreadSampler in Puma primary

What does this MR do and why?

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports