Implement memory and CPU limits to the Prometheus processes in VMs
Summary
Implement CPU and Memory limits on the Prometheus VMs for the prometheus
and thanos-sidecar
processes. So they can't use up all the available memory and CPU resulting into us not able to ssh inside of the machine to debug it.
Related Incident(s)
Originating issue(s):
Desired Outcome/Acceptance Criteria
We have process memory and CPU limits for thanos-sidecar
and prometheus
in the following chef roles:
-
roles/gprd-infra-prometheus-server.json
-
roles/gprd-infra-prometheus-app.json
-
roles/gprd-infra-prometheus-db.json
-
roles/gstg-infra-prometheus-server.json
-
roles/gstg-infra-prometheus-server-db.json
-
roles/gstg-infra-prometheus-server-app.json
Associated Services
ServicePrometheus in Production Engineering
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose out of -
Give context for what problem this corrective action is trying to prevent from re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'Reliability::P4')