Skip to content

Implement memory and CPU limits to the Prometheus processes in VMs

Summary

Implement CPU and Memory limits on the Prometheus VMs for the prometheus and thanos-sidecar processes. So they can't use up all the available memory and CPU resulting into us not able to ssh inside of the machine to debug it.

Related Incident(s)

Originating issue(s):

Desired Outcome/Acceptance Criteria

We have process memory and CPU limits for thanos-sidecar and prometheus in the following chef roles:

  1. roles/gprd-infra-prometheus-server.json
  2. roles/gprd-infra-prometheus-app.json
  3. roles/gprd-infra-prometheus-db.json
  4. roles/gstg-infra-prometheus-server.json
  5. roles/gstg-infra-prometheus-server-db.json
  6. roles/gstg-infra-prometheus-server-app.json

Associated Services

ServicePrometheus in Production Engineering

Corrective Action Issue Checklist

  • Link the incident(s) this corrective action arose out of
  • Give context for what problem this corrective action is trying to prevent from re-occurring
  • Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
  • Assign a priority (this will default to 'Reliability::P4')