docs(monitoring): cover 0.15.0 features (xmin horizon, pg_stat_io, upgrade, security)
What this MR does
Closes the PostgresAI 0.15.0 documentation coverage gaps found by the release docs audit. Every HIGH and MEDIUM gap is addressed, plus the easy LOW ones. Content is based on the 0.15.0 release notes and verified against the actual source behavior in postgres-ai/postgresai (metric definitions in config/pgwatch-prometheus/metrics.yml, .env.example, docker-compose.yml, cli/), so flag names, defaults, and metric labels are concrete and not invented.
Related release: postgresai#186 (0.15.0).
Docusaurus bun run build passes; all new pages are registered in sidebars.js and internal anchors verified.
New pages
docs/monitoring/dashboards/14-io-statistics.md— Dashboard 14 (I/O statistics,pg_stat_io, PG16+). Fixes the off-by-one where the overview claimed "14 dashboards" but listed 13.docs/monitoring/getting-started/upgrade.md— monitoring-stack upgrade guide:mon update/mon update-config, additive value-preserving.envmigration, the new requiredVM_AUTH_USERNAME/VM_AUTH_PASSWORDkeys, and an explicit manual-Docker-Compose callout.docs/monitoring/advanced/telemetry.md— monitoring telemetry reporter: what it sends, that it is off unless configured, the env vars to enable/disable, privacy note.docs/monitoring/advanced/security.md— monitoring security: VM basic auth, credential rotation, supply-chain pinning, at-rest encryption.
Expanded / corrected pages
- Metrics reference: documented
xmin_horizon,xmin_horizon_blockers,pg_stat_io,pg_wal_size, andlock_waits(with the newblocked_pid/blocker_pidlabels); removed them from the "not yet documented here" note. - Dashboard 07 rewritten as "Autovacuum and xmin horizon" — dropped the stale "Dashboard in development" banner and the pre-redesign "Tables approaching wraparound" framing; added xmin-horizon RCA (blocker classes, data vs catalog horizon, blocker attribution).
- Dashboard 13: lock-wait metrics now documented as carrying
blocked_pid/blocker_pidlabels usable directly in Grafana/PromQL. - Dashboards index: fixed 07 label, added Dashboard 14 row, added a Top-N +
$other$explainer (cross-linked from 08/10), updated time-range tip to the newnow-1hdefault. - prometheus-config: new Authentication & security section (VM auth + rotation),
QUERYID_RETENTION_HOURS, reconciled VM query/search flag names to the canonicalVM_QUERY_DURATION/VM_MAX_CONCURRENT_REQUESTS. - configuration/index: per-service resource-limit reference table; idempotent config-seeding note.
- grafana-config: Desert Bloom default theme + revert guidance.
- installation-docker: pinned
grafana:12.3.2(was:latest), image-pinning / supply-chain note, restart-behavior note, VM_AUTH context. - installation-helm: 0.15 chart-values note, VM basic-auth secret wiring, retention 90d vs 336h reconciliation.
- requirements: Node.js 18+ CLI runtime, PG16+ note for
pg_stat_io, retention reconciled withQUERYID_RETENTION_HOURS. - CLI reference: Node 18+ requirement, F004/F005 +
prepare-dbhint, stdout/stderr separation for--json/--markdown, 15 MCP tools (issues / action items / reports / files),--projectConsole registration, stale--tag0.14.0→0.15.0, expandedmon update/update-config. - how-to-install-mcp: verify-installation tool list expanded to all 15 MCP tools.
- installation-cloud: Node 18+ prerequisite.
- permissions: renamed "Rotate credentials" → "Rotate monitoring database credentials" to disambiguate from VM auth rotation.
Deferred
- Encrypted-at-rest infrastructure checks (!216 (merged)) as a standalone how-to. The shipped checks are Terraform/KMS infrastructure tests for hosted infra (CI-level), not a user-runnable command; documenting "how to run/interpret" would require inventing user-facing behavior. The user-facing security surface (at-rest guidance for self-hosted volumes + hosted KMS note) is covered in the new
advanced/security.mdinstead. - Public Helm release-process page (!132 (merged)). Kept to a chart-values/upgrade note in
installation-helm.md; whether the full release process is public is a maintainer call.
0.15.0 documentation coverage matrix
| # | 0.15 Feature (MR) | Status (pre-MR) | Addressed here |
|---|---|---|---|
| 1 | xmin horizon + Dashboard 7 redesign (!242 (merged)/!254 (merged)/!257 (merged)/!277 (merged)) | MISSING | |
| 2 | pg_wal_size (!243 (merged)) |
MISSING | |
| 3 | pg_stat_io metric group (!168 (merged)) |
MISSING | |
| 4 | Dashboard 14 I/O statistics (!168 (merged)) | MISSING | |
| 5 | Lock-wait blocked_pid/blocking_pid (!278 (merged)) |
MISSING | |
| 6 | Desert Bloom theme (!221 (merged)) | MISSING | |
| 7 | prepare-db hint on bloat checks (!214 (merged)) | MISSING | |
| 8 | Encrypted-at-rest checks (!216 (merged)) | MISSING | |
| 9 | QUERYID_RETENTION_HOURS (!259 (merged)) |
MISSING | |
| 10 | Monitoring telemetry reporter (!251 (merged)) | MISSING | |
| 11 | Upgrade procedure + VM_AUTH keys (!270 (merged)/!170 (merged)) | MISSING | |
| 12 | Restart policies / reboot persistence (!246 (merged)/!239 (merged)) | MISSING | |
| 13 | Idempotent config seeding (!250 (merged)) | MISSING | |
| 14 | top-N + $other$ bucket (!262 (merged)/!267 (merged)) |
PARTIAL | |
| 16 | Dashboard catalog accuracy (index.md) | PARTIAL | |
| 17 | Checkup F004/F005 in CLI (!180 (merged)) | PARTIAL | |
| 18 | Checkup stderr / cleaner --output (!213 (merged)) | PARTIAL | |
| 19 | --project Console registration (!210 (merged)) |
PARTIAL | |
| 20 | MCP tool catalog 15 tools (!211 (merged)/!215 (merged)) | PARTIAL | |
| 21 | VM basic auth purpose/why (!217 (merged)) | PARTIAL | |
| 22 | rotate-vm-auth discoverability (!217 (merged)) | PARTIAL | |
| 23 | Per-service resource controls (!252 (merged)/!238 (merged)/!248 (merged)) | PARTIAL | |
| 24 | VM query/search flag names (!252 (merged)) | PARTIAL | |
| 25 | Node.js 18+ (!201 (merged)) | PARTIAL | |
| 26 | requirements.md 0.15 accuracy | PARTIAL | |
| 27 | Pinned image tags (!233 (merged)) | PARTIAL | |
| 28 | Helm 0.15 chart defaults (!280 (merged)/!132 (merged)/!218 (merged)) | MISSING |
Do not merge — for review.
Closes #223 (closed)
Edited by Maya P