docs(monitoring): cover 0.15.0 features (xmin horizon, pg_stat_io, upgrade, security)

What this MR does

Closes the PostgresAI 0.15.0 documentation coverage gaps found by the release docs audit. Every HIGH and MEDIUM gap is addressed, plus the easy LOW ones. Content is based on the 0.15.0 release notes and verified against the actual source behavior in postgres-ai/postgresai (metric definitions in config/pgwatch-prometheus/metrics.yml, .env.example, docker-compose.yml, cli/), so flag names, defaults, and metric labels are concrete and not invented.

Related release: postgresai#186 (0.15.0).

Docusaurus bun run build passes; all new pages are registered in sidebars.js and internal anchors verified.

New pages

  • docs/monitoring/dashboards/14-io-statistics.mdDashboard 14 (I/O statistics, pg_stat_io, PG16+). Fixes the off-by-one where the overview claimed "14 dashboards" but listed 13.
  • docs/monitoring/getting-started/upgrade.mdmonitoring-stack upgrade guide: mon update / mon update-config, additive value-preserving .env migration, the new required VM_AUTH_USERNAME/VM_AUTH_PASSWORD keys, and an explicit manual-Docker-Compose callout.
  • docs/monitoring/advanced/telemetry.mdmonitoring telemetry reporter: what it sends, that it is off unless configured, the env vars to enable/disable, privacy note.
  • docs/monitoring/advanced/security.mdmonitoring security: VM basic auth, credential rotation, supply-chain pinning, at-rest encryption.

Expanded / corrected pages

  • Metrics reference: documented xmin_horizon, xmin_horizon_blockers, pg_stat_io, pg_wal_size, and lock_waits (with the new blocked_pid/blocker_pid labels); removed them from the "not yet documented here" note.
  • Dashboard 07 rewritten as "Autovacuum and xmin horizon" — dropped the stale "Dashboard in development" banner and the pre-redesign "Tables approaching wraparound" framing; added xmin-horizon RCA (blocker classes, data vs catalog horizon, blocker attribution).
  • Dashboard 13: lock-wait metrics now documented as carrying blocked_pid/blocker_pid labels usable directly in Grafana/PromQL.
  • Dashboards index: fixed 07 label, added Dashboard 14 row, added a Top-N + $other$ explainer (cross-linked from 08/10), updated time-range tip to the new now-1h default.
  • prometheus-config: new Authentication & security section (VM auth + rotation), QUERYID_RETENTION_HOURS, reconciled VM query/search flag names to the canonical VM_QUERY_DURATION/VM_MAX_CONCURRENT_REQUESTS.
  • configuration/index: per-service resource-limit reference table; idempotent config-seeding note.
  • grafana-config: Desert Bloom default theme + revert guidance.
  • installation-docker: pinned grafana:12.3.2 (was :latest), image-pinning / supply-chain note, restart-behavior note, VM_AUTH context.
  • installation-helm: 0.15 chart-values note, VM basic-auth secret wiring, retention 90d vs 336h reconciliation.
  • requirements: Node.js 18+ CLI runtime, PG16+ note for pg_stat_io, retention reconciled with QUERYID_RETENTION_HOURS.
  • CLI reference: Node 18+ requirement, F004/F005 + prepare-db hint, stdout/stderr separation for --json/--markdown, 15 MCP tools (issues / action items / reports / files), --project Console registration, stale --tag 0.14.00.15.0, expanded mon update/update-config.
  • how-to-install-mcp: verify-installation tool list expanded to all 15 MCP tools.
  • installation-cloud: Node 18+ prerequisite.
  • permissions: renamed "Rotate credentials" → "Rotate monitoring database credentials" to disambiguate from VM auth rotation.

Deferred

  • Encrypted-at-rest infrastructure checks (!216 (merged)) as a standalone how-to. The shipped checks are Terraform/KMS infrastructure tests for hosted infra (CI-level), not a user-runnable command; documenting "how to run/interpret" would require inventing user-facing behavior. The user-facing security surface (at-rest guidance for self-hosted volumes + hosted KMS note) is covered in the new advanced/security.md instead.
  • Public Helm release-process page (!132 (merged)). Kept to a chart-values/upgrade note in installation-helm.md; whether the full release process is public is a maintainer call.

0.15.0 documentation coverage matrix

# 0.15 Feature (MR) Status (pre-MR) Addressed here
1 xmin horizon + Dashboard 7 redesign (!242 (merged)/!254 (merged)/!257 (merged)/!277 (merged)) MISSING 07 rewrite + metrics reference
2 pg_wal_size (!243 (merged)) MISSING metrics reference + system-metrics
3 pg_stat_io metric group (!168 (merged)) MISSING metrics reference + requirements
4 Dashboard 14 I/O statistics (!168 (merged)) MISSING new page + index
5 Lock-wait blocked_pid/blocking_pid (!278 (merged)) MISSING 13 + metrics reference
6 Desert Bloom theme (!221 (merged)) MISSING grafana-config
7 prepare-db hint on bloat checks (!214 (merged)) MISSING CLI reference checkup
8 Encrypted-at-rest checks (!216 (merged)) MISSING deferred; security.md covers user-facing surface
9 QUERYID_RETENTION_HOURS (!259 (merged)) MISSING prometheus-config + requirements + upgrade
10 Monitoring telemetry reporter (!251 (merged)) MISSING new telemetry page
11 Upgrade procedure + VM_AUTH keys (!270 (merged)/!170 (merged)) MISSING new upgrade page
12 Restart policies / reboot persistence (!246 (merged)/!239 (merged)) MISSING installation-docker
13 Idempotent config seeding (!250 (merged)) MISSING configuration/index
14 top-N + $other$ bucket (!262 (merged)/!267 (merged)) PARTIAL index + 08/10
16 Dashboard catalog accuracy (index.md) PARTIAL count/labels/time-range
17 Checkup F004/F005 in CLI (!180 (merged)) PARTIAL CLI reference
18 Checkup stderr / cleaner --output (!213 (merged)) PARTIAL CLI reference
19 --project Console registration (!210 (merged)) PARTIAL CLI reference
20 MCP tool catalog 15 tools (!211 (merged)/!215 (merged)) PARTIAL CLI reference + MCP howto
21 VM basic auth purpose/why (!217 (merged)) PARTIAL prometheus-config + install + security
22 rotate-vm-auth discoverability (!217 (merged)) PARTIAL prometheus-config + security
23 Per-service resource controls (!252 (merged)/!238 (merged)/!248 (merged)) PARTIAL configuration/index table
24 VM query/search flag names (!252 (merged)) PARTIAL reconciled to canonical names
25 Node.js 18+ (!201 (merged)) PARTIAL requirements + CLI ref + cloud
26 requirements.md 0.15 accuracy PARTIAL Node, PG16, retention
27 Pinned image tags (!233 (merged)) PARTIAL installation-docker + CLI --tag fix
28 Helm 0.15 chart defaults (!280 (merged)/!132 (merged)/!218 (merged)) MISSING installation-helm notes (release-process deferred)

Do not merge — for review.

🤖 Generated with Claude Code

Closes #223 (closed)

Edited by Maya P

Merge request reports

Loading