feat: add xmin horizon monitoring

Summary

  • add xmin_horizon and xmin_horizon_blockers pgwatch metrics
  • move the xmin horizon panels into the existing 07. Autovacuum, xmin horizon, bloat Grafana dashboard and remove the standalone dashboard 14
  • add the existing long-running transaction age graph plus xmin overview/counts/top-blockers below the autovacuum activity panel
  • keep an experimental xmin horizon age panel on the first node overview dashboard, pointing details to dashboard 07
  • enrich the current blocker table with activity queryid, slot name/type/status/version-optional slot fields, standby name, and prepared transaction gid/owner while keeping query text out of Prometheus
  • add integration coverage for pg_stat_activity, replication slots, prepared xacts, standby feedback, null-xmin slot handling, and non-client backend exclusion
  • preserve the existing replication_slots.xmin_age_tx null behavior while coalescing null slot ages only inside the new xmin_horizon metrics

Why

Traditional long-running transaction monitoring is not enough to explain xmin horizon pressure. This change exposes the four main blocker classes separately and adds actionable top-blocker details for RCA without storing query text as a Prometheus label.

Validation

  • loaded dashboard JSON files with python3, including dashboard 1 and dashboard 07 via the Helm dashboard symlinks
  • parsed config/pgwatch-prometheus/metrics.yml and verified both new metrics exist
  • executed xmin_horizon_blockers SQL against a live local PostgreSQL target on localhost:55432
  • compiled tests/xmin_horizon/test_xmin_horizon_metric.py and tests/xmin_horizon/test_metrics_sql_static.py with python3 -m py_compile
  • ran python3 -m unittest discover -s tests/xmin_horizon -p 'test_metrics_sql_static.py'
  • ran bash -n tests/xmin_horizon/run_test.sh
  • ran git diff --check
  • latest GitLab pipeline 2469906125 passed for commit 6df21e6, including integration:xmin-horizon and validate-helm-chart

Closes #166 (closed)

Edited by Maya P

Merge request reports

Loading