[Feature Flag] Roll out cgroups
What
Enable the gitaly_run_cmds_in_cgroup
feature flag enables Gitaly to spawn processes in Cgroups to be killed if they breach certain memory or CPU thresholds.
Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to:
#g_create_gitaly
- Best individual to reach out to: jcaigitlab
Expectations
What release does this feature occur in first?
What are we expecting to happen?
When a process tries to use more memory than the configured limit for cgroups, the process should be killed.
What might happen if this goes wrong?
We might kill too many processes if the limit is set too low.
dashboard: at the bottom of this dashboard are two cgroup graphs. https://dashboards.gitlab.net/d/000000214/gitaly-fleet-overview?orgId=1
What can we monitor to detect problems with this?
If more than a few processes are getting killed, that's a problem
Roll Out Steps
-
Enable on staging-ref -
Enable on staging -
Is the required code deployed on staging? (howto) -
Enable on staging (howto) -
Add featureflagstaging to this issue (howto) -
Test on staging (howto) -
Verify the feature flag was used by checking Prometheus metric [ gitaly_feature_flag_checks_total
]
-
-
Enable on production -
Is the required code deployed on production? (howto) -
Enable on production in #production
(howto) -
Add featureflagproduction to this issue -
Verify the feature flag was used by checking Prometheus metric gitaly_feature_flag_checks_total
-
-
Default-enable the feature flag (optional, only required if backwards-compatibility concerns exist) -
Wait for release containg default-disabled feature flag. -
Change the feature flag to default-enabled (howto) -
Wait for release containing default-enabled feature flag.
-
-
Remove feature flag
Please refer to the documentation of feature flags for further information.
Edited by John Cai