[Feature flag] Roll out USS+PSS metrics collection
What
We introduced the collect_memory_uss_pss
feature flag in #215864 (closed) to report new memory metrics that are potentially costly to read in production.
We need to make sure that these are safe to collect first, the data is coming back correctly, before removing the toggle.
UPDATE: Because using a FF caused unexpected problems with DB connectivity during app load, we moved this to an env var instead: enable_memory_uss_pss
Owners
- Team: groupmemory
- Most appropriate slack channel to reach out to:
#g_memory
- Best individual to reach out to: @mkaeppler
Expectations
### What are we expecting to happen?
We should start seeing new Prometheus metrics on staging and in production:
ruby_process_unique_memory_bytes
ruby_process_proportional_memory_bytes
What might happen if this goes wrong?
We might see increased CPU usage due to RubySampler
polling from /proc/<pid>/smaps_rollup
What can we monitor to detect problems with this?
Probably CPU utilization on any node that runs the Ruby process sampler (web, sidekiq)
- https://dashboards.gitlab.net/d/Qe6veT_mk/fleet-utilization?orgId=1&refresh=5m
- https://dashboards.gitlab.net/d/general-service/general-service-platform-metrics?orgId=1&var-PROMETHEUS_DS=Global&var-environment=gstg&var-type=web&var-stage=main&var-sigma=2
Roll Out Steps
-
Enable on staging -
Test on staging -
Ensure that documentation has been updated -
Enable on GitLab.com for individual groups/projects listed above and verify behaviour -
Coordinate a time to enable the flag with #production
and#g_delivery
on slack. -
Announce on the issue an estimated time this will be enabled on GitLab.com -
Enable on GitLab.com by running chatops command in #production
-
Cross post chatops slack command to #support_gitlab-com
(more guidance when this is necessary in the dev docs) and in your team channel -
Announce on the issue that the flag has been enabled -
Remove feature flag and add changelog entry -
After the flag removal is deployed, clean up the feature flag by running chatops command in #production
channel
Edited by Matthias Käppler