Right-size self-cadvisor resource limits (still cpus: 0.15 / mem_limit: 192m post-!238)
## Summary
[!238](https://gitlab.com/postgres-ai/postgresai/-/merge_requests/238) raised `monitoring_flask_backend` from `cpus: 0.1 / mem_limit: 192m` to `cpus: 0.5 / mem_limit: 1Gi` to stop a sustained gunicorn-worker OOM loop. **`self-cadvisor` still has the original `cpus: 0.15 / mem_limit: 192m`** in `docker-compose.yml`:
```yaml
self-cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.51.0
container_name: self-cadvisor
cpus: 0.15
mem_limit: 192m
privileged: true
...
```
## Why it matters
- cAdvisor walks every cgroup on the host on each housekeeping pass. With 11+ containers on a monitoring VM (and Docker volume metadata for each), 192 MiB is at the edge.
- Reproduced behavior on `mon-stardex` (us-east-2): `self-cadvisor` reported `Up 2 days (unhealthy)` during the disk-full incident on 2026-04-28, and dropped out of the docker network on the subsequent host stop/start. Memory pressure under load is the most likely cause of the unhealthy state.
- Symptoms: Grafana panels fed by cAdvisor metrics (per-container CPU/mem, container restart counts) showing "No data" or stale data on otherwise healthy hosts.
## Proposed fix
Decide on right-sizing — e.g. align with the `!238` pattern:
```yaml
self-cadvisor:
cpus: 0.25
mem_limit: 384m
```
384 MiB is a conservative bump; cAdvisor's RSS scales with container count and metric set. Worth profiling actual peak RSS on a monitoring VM under steady-state load (11 containers, ~6 metric scrapes/min) before settling on a final number — drop `--disable_metrics` flags if any can be added to reduce footprint instead of just bumping limits.
## Test plan
- [ ] Profile `self-cadvisor` RSS over ~1 hour on a representative monitoring VM
- [ ] Bump limits in `docker-compose.yml` and helm chart values
- [ ] Confirm `self-cadvisor` stays `(healthy)` for ≥ 24h, no OOM kills in `journalctl -k --grep "oom-kill"`
- [ ] Confirm Grafana per-container panels populate
## Related
- !238 (merged, fixes flask backend resources)
- postgres-ai/postgresai#XXX (sibling issue: extend !239 restart-policy coverage to `self-cadvisor`)
- Ops-side write-up: postgres-ai/infra#51 (item 5: `flask-pgss-api` OOM, fixed; this issue tracks the next-most-undersized container)
issue