fix(cli): migrate .env on mon update / mon update-config (closes #203)

Summary

Closes #203 (closed) — in-place 0.14 → 0.15 upgrade fails silently when VM_AUTH_USERNAME / VM_AUTH_PASSWORD are missing.

While verifying the in-place upgrade path on a pre-0.15 monitoring deployment, the documented upgrade flow (pgai mon update, or raw docker compose pull && up -d) leaves .env un-migrated, so sink-prometheus exits immediately with:

fatal cannot read "/postgres_ai_configs/prometheus/prometheus.yml":
cannot expand environment variables: missing "VM_AUTH_USERNAME" env var

The greenfield mon local-install path already handles this. This MR extends the same contract to the in-place upgrade entry points.

What changed

  • New helper ensureRequiredEnvVars(projectDir) in cli/bin/postgres-ai.ts. Single source of truth for "which keys must exist in .env for the stack to start." Reads .env, appends any missing keys with safe random defaults, writes back with 0600 perms. Purely additive: existing values are preserved verbatim. Idempotent: a second call on a fully-populated .env is a no-op.
  • mon update-config now calls the helper before running sources-generator. Existing 0.14 users running pgai mon update-config to apply config changes also pick up the new env contract.
  • mon update now calls the helper as its first step, and also no longer requires a git checkout (the npm-installed CLI case, which is the common one) — when there's no .git, it skips git fetch/pull and goes straight to the env migration + docker compose pull.
  • README "Upgrading" section updated to mention that mon update / mon update-config now migrate .env for newly-required keys, and that the manual upgrade flow should call mon update-config once to fill them in.

What was intentionally NOT changed

  • mon local-install env-writing logic is untouched. It already preserves and generates VM_AUTH_* correctly via a different (rewrite-the-file) model and is well-tested. DRY-ing it up would expand scope without changing observable behavior.
  • docker-compose.yml already passes VM_AUTH_USERNAME / VM_AUTH_PASSWORD into sink-prometheus and grafana (commit 46ed2f36, "feat(security): add HTTP Basic Auth to VictoriaMetrics"). No compose changes needed.
  • scripts/rotate-vm-auth.sh is unchanged; it remains the right tool for rotating an existing password.

Tests

cli/test/upgrade.test.ts gains four new cases (all pass locally, see bun test test/upgrade.test.ts):

  1. mon update-config appends missing VM_AUTH_* to a 0.14-shaped .env and preserves PGAI_TAG + GF_SECURITY_ADMIN_PASSWORD.
  2. mon update appends missing VM_AUTH_* to a 0.14-shaped .env and prints what it added.
  3. mon update preserves existing VM_AUTH_* values verbatim (no rotation on upgrade).
  4. mon update-config handles a .env without a trailing newline without gluing keys onto the previous line.

Full upgrade suite: 19 pass, 0 fail. Full CLI suite delta vs. main: +4 passing tests, same 3 pre-existing failures (ajv/dist/2020 module missing in two test files; checkup-api HTTP transport timeout — all unrelated).

Test plan

  • cd cli && bun test test/upgrade.test.ts passes (19/19)
  • On a real pre-0.15 deployment whose .env lacks VM_AUTH_*: pgai mon update-config adds the two keys, and pgai mon restart succeeds with sink-prometheus and grafana both healthy
  • On a 0.15 deployment whose .env already has VM_AUTH_*: pgai mon update reports .env is up to date and does not rotate credentials

Per SOC2, please review issue #203 (closed) alongside this MR.

Merge request reports

Loading