PG Monitoring review Dashboard 1

Related to the current monitoring we need to review:

  • Dashboard #1 (discussed above):
    • TPS (transactions per second)
    • QPS (queries per second)
    • latency (average query duration)
    • longest transaction time
    • Connections (total count and by state from pg_stat_activity)
    • Error count (ROLLBACKs absolute value)
    • COMMIT vs. ROLLBACK ratio
    • Buffer pool efficiency – hits vs. reads ratio
    • Database IO time
    • Temp bytes written
    • Replication lag
    • Count of WALs waiting to be archived (a.k.a. archive_command lag) // there is "seconds since last WAL archive" in the old WAL-E dashboard, but its meaning is different that "lag size"
    • WAL generation rates // there is WAL-G completed per second on the old WAL-E dashboard, but it's not generation - it's archiving
    • Locks (exclusive locks count)
    • Wraparound: Transaction ID age (exists in the "patroni: Overview" dashboard)
    • Wraparound: Transaction Multixact ID age
    • Database Size // Nik: I'm sure I saw it somewhere many times but cannot find now

Acceptance criteria:

  • Fix the metrics and the dashboard