PG Monitoring review Dashboard 1
Related to the current monitoring we need to review:
- Dashboard #1 (discussed above):
-
✅ TPS (transactions per second) -
⛔ QPS (queries per second) -
⛔ latency (average query duration) -
⛔ longest transaction time -
✅ Connections (total count and by state frompg_stat_activity) -
✅ Error count (ROLLBACKs absolute value) -
⛔ COMMIT vs. ROLLBACK ratio -
⛔ Buffer pool efficiency – hits vs. reads ratio -
⛔ Database IO time -
⛔ Temp bytes written -
✅ Replication lag -
⛔ Count of WALs waiting to be archived (a.k.a.archive_commandlag) // there is "seconds since last WAL archive" in the old WAL-E dashboard, but its meaning is different that "lag size" -
⛔ WAL generation rates // there is WAL-G completed per second on the old WAL-E dashboard, but it's not generation - it's archiving -
✅ Locks (exclusive locks count) -
✅ Wraparound: Transaction ID age (exists in the "patroni: Overview" dashboard) -
⛔ Wraparound: Transaction Multixact ID age -
⛔ Database Size // Nik: I'm sure I saw it somewhere many times but cannot find now
-
Acceptance criteria:
-
Fix the metrics and the dashboard