feat: add xmin horizon monitoring
Summary
- add
xmin_horizonandxmin_horizon_blockerspgwatch metrics - move the xmin horizon panels into the existing
07. Autovacuum, xmin horizon, bloatGrafana dashboard and remove the standalone dashboard 14 - add the existing long-running transaction age graph plus xmin overview/counts/top-blockers below the autovacuum activity panel
- keep an experimental xmin horizon age panel on the first node overview dashboard, pointing details to dashboard 07
- enrich the current blocker table with activity
queryid, slot name/type/status/version-optional slot fields, standby name, and prepared transactiongid/owner while keeping query text out of Prometheus - add integration coverage for
pg_stat_activity, replication slots, prepared xacts, standby feedback, null-xmin slot handling, and non-client backend exclusion - preserve the existing
replication_slots.xmin_age_txnull behavior while coalescing null slot ages only inside the newxmin_horizonmetrics
Why
Traditional long-running transaction monitoring is not enough to explain xmin horizon pressure. This change exposes the four main blocker classes separately and adds actionable top-blocker details for RCA without storing query text as a Prometheus label.
Validation
- loaded dashboard JSON files with
python3, including dashboard 1 and dashboard 07 via the Helm dashboard symlinks - parsed
config/pgwatch-prometheus/metrics.ymland verified both new metrics exist - executed
xmin_horizon_blockersSQL against a live local PostgreSQL target onlocalhost:55432 - compiled
tests/xmin_horizon/test_xmin_horizon_metric.pyandtests/xmin_horizon/test_metrics_sql_static.pywithpython3 -m py_compile - ran
python3 -m unittest discover -s tests/xmin_horizon -p 'test_metrics_sql_static.py' - ran
bash -n tests/xmin_horizon/run_test.sh - ran
git diff --check - latest GitLab pipeline
2469906125passed for commit6df21e6, includingintegration:xmin-horizonandvalidate-helm-chart
Closes #166 (closed)
Edited by Maya P