Consider running RubySampler on Puma primary
Our main source of information for Ruby memory use are the ruby_process_*
metrics collected via RubySampler
. This includes ruby_process_resident_memory_bytes
which represents process RSS.
However, this sampler currently only runs in Puma workers. This means that when taking aggregates in Thanos, such as summing up process RSS, the primary process is not accounted for:
Here, pids 0-5 are all Puma workers.
This provides a misleading picture of actual RSS allocated to Rails processes, especially when cross-referencing this data with memory killer events, since the Puma worker killer reaps workers based on total cluster RSS, not just worker RSS.
We should consider running RubySampler
on the puma_master
as well. When doing this, we need to make sure to either stop and restart the sampler when forking into workers, or ensure that pidfiles are reset so that metrics from the primary do not leak into workers.