feat(compose): parameterize per-service cpus/mem_limit and VictoriaMetrics search flags

Closes #176 (closed)

Summary

Builds on main's existing ${VM_RETENTION_PERIOD:-336h} (introduced by !259 (merged)) and adds env-var indirection where main still hard-codes literal values:

  • Per-service cpus: and mem_limit: for all 12 services${<SERVICE>_CPUS:-...} / ${<SERVICE>_MEM:-...} (e.g. SINK_PROMETHEUS_MEM, FLASK_MEM, CADVISOR_CPUS, TARGET_STANDBY_MEM). Memory defaults written in bytes for consistency with !238 (merged) / !248 (merged).
  • Two more VictoriaMetrics tuning flags on sink-prometheus's command — only the flags already on main, no new ones:
    • -search.maxQueryDuration${VM_QUERY_DURATION:-30s}
    • -search.maxConcurrentRequests${VM_MAX_CONCURRENT_REQUESTS:-16}
  • .env.example documents every new variable, commented-out, with default + one-line description.
  • Contract tests in tests/compliance_vectors/test_compose_parameterization.py (TDD: test commit first) verify (a) defaults match main's literals and (b) env-var overrides actually take effect for representative services.

26 new env vars total (12 services × _CPUS + _MEM, plus the 2 VM flag vars). Defaults match main's pre-MR literals byte-for-byte; this is a no-op when env vars are unset.

Test plan

  • python3 -c "import yaml; yaml.safe_load(open('docker-compose.yml'))" exits 0.
  • PGAI_TAG=0.15.0-rc1 REPLICATOR_PASSWORD=x VM_AUTH_USERNAME=x VM_AUTH_PASSWORD=x docker compose -f docker-compose.yml config --quiet exits 0 with no monitoring tuning env vars set.
  • python3 -m pytest tests/compliance_vectors/test_compose_parameterization.py tests/compliance_vectors/test_flask_resources.py tests/compliance_vectors/test_cadvisor_resources.py — 37 passed, 1 skipped (helm not available locally).
  • bash tests/compliance_vectors/check_compose_retention_config.sh — all five retention scenarios pass with the new template form.
  • First commit (test: assert resource limits resolve from env vars ...) fails against current main (TDD red), confirming the new tests actually exercise the indirection.
  • CI pipeline green.

What's NOT changed

  • Defaults preserved. Every parameterized value resolves to main's pre-MR literal when unset. Laptop-dev workflow unaffected.
  • No new VictoriaMetrics flags. -memory.allowedPercent and -search.maxQueueDuration are deliberately out of scope — they aren't on main's sink-prometheus command today, and adding them would be new behavior, not parameterization.
  • No production tunings. This MR only makes prod tuning possible without forking compose. Bumping actual prod defaults is a follow-up on the provisioning playbook.
  • Helm chart untouched. postgres_ai_helm/ already exposes resource overrides via values.yaml (!238 (merged) for Flask, !248 (merged) for cAdvisor). This MR is the compose-side counterpart.
  • init-configs.sh and configs image untouched — separate concern (see !250 (merged)).
Edited by Nikolay Samokhvalov

Merge request reports

Loading