chore: right-size self-cadvisor resource limits
What
Right-size self-cadvisor container resources, mirroring the !238 (merged) pattern that bumped monitoring_flask_backend.
Docker Compose (docker-compose.yml)
| Before | After | |
|---|---|---|
| CPU limit | 0.15 cores | 0.25 cores |
| Memory limit | 192 MiB (192m) |
384 MiB (402653184 bytes) |
Helm (postgres_ai_helm/values.yaml)
Previously resources: {} (no requests or limits). Now:
- Requests: CPU 100m, memory 192Mi
- Limits: CPU 250m, memory 384Mi
The existing templates/cadvisor-daemonset.yaml already wires .Values.cadvisor.resources through, so no template change is needed.
Why
self-cadvisor was observed reporting Up N (unhealthy) on a real monitoring VM (mon-stardex, us-east-2) with 11+ containers under steady-state load. cAdvisor walks every cgroup on each housekeeping pass; RSS scales with container count and the metric set. The 192 MiB cap is at the edge for a typical monitoring host, so a conservative bump to 384 MiB removes the immediate memory-pressure failure mode without significantly overcommitting.
The CPU bump from 0.15 to 0.25 follows the same conservative direction; cAdvisor is bursty during housekeeping passes and the prior 0.15 cap could throttle scrape responses.
Out of scope: HPA/VPA, environment-specific overrides, and cAdvisor flag tuning (e.g. extra --disable_metrics entries). This MR only raises the default resources in Compose and Helm.
Validation
python3 -c "import yaml; yaml.safe_load(open('docker-compose.yml'))"— exit 0python3 -c "import yaml; yaml.safe_load(open('postgres_ai_helm/values.yaml'))"— exit 0helmnot available locally;templates/cadvisor-daemonset.yamlalready references.Values.cadvisor.resourcesvia{{- with ... }}so the new block renders without template changes.- Compose syntax check skipped locally (
docker compose config --quietrequiresVM_AUTH_PASSWORDenv in this tree); YAML parse confirms structure.
Follow-up
Profiling cAdvisor's actual peak RSS under steady-state load (11 containers, ~6 metric scrapes/min) on a representative monitoring VM is recommended; the 384 MiB / 0.25 CPU values are a conservative bump and should be revisited if profiling shows a different ceiling, or if --disable_metrics flags can be added to reduce footprint instead of raising limits.
Closes #172 (closed) Related: !238 (merged)