[Feature flag] Roll out USS+PSS metrics collection

What

We introduced the collect_memory_uss_pss feature flag in #215864 (closed) to report new memory metrics that are potentially costly to read in production.

We need to make sure that these are safe to collect first, the data is coming back correctly, before removing the toggle.

UPDATE: Because using a FF caused unexpected problems with DB connectivity during app load, we moved this to an env var instead: enable_memory_uss_pss

Owners

Team: groupmemory
Most appropriate slack channel to reach out to: #g_memory
Best individual to reach out to: @mkaeppler

Expectations

### What are we expecting to happen?

We should start seeing new Prometheus metrics on staging and in production:

ruby_process_unique_memory_bytes
ruby_process_proportional_memory_bytes

What might happen if this goes wrong?

We might see increased CPU usage due to RubySampler polling from /proc/<pid>/smaps_rollup

What can we monitor to detect problems with this?

Probably CPU utilization on any node that runs the Ruby process sampler (web, sidekiq)

https://dashboards.gitlab.net/d/Qe6veT_mk/fleet-utilization?orgId=1&refresh=5m
https://dashboards.gitlab.net/d/general-service/general-service-platform-metrics?orgId=1&var-PROMETHEUS_DS=Global&var-environment=gstg&var-type=web&var-stage=main&var-sigma=2

Roll Out Steps

Enable on staging
Test on staging
Ensure that documentation has been updated
Enable on GitLab.com for individual groups/projects listed above and verify behaviour
Coordinate a time to enable the flag with #production and #g_delivery on slack.
Announce on the issue an estimated time this will be enabled on GitLab.com
Enable on GitLab.com by running chatops command in #production
Cross post chatops slack command to #support_gitlab-com (more guidance when this is necessary in the dev docs) and in your team channel
Announce on the issue that the flag has been enabled
Remove feature flag and add changelog entry
After the flag removal is deployed, clean up the feature flag by running chatops command in #production channel

Edited May 15, 2020 by Matthias Käppler

Assignee Loading

Time tracking Loading

Confidentiality

Confidentiality controls have moved to the issue actions menu () at the top of the page.