fix(metrics): port "pgwatch postgresai edition"'s top-N + 'other' bucket to pg_stat/statio_all_*

The four per-relation metrics (pg_stat_all_indexes, pg_stat_all_tables, pg_statio_all_tables, pg_statio_all_indexes) had no schema filter and a flat LIMIT 5000 truncation. On extension- or schema-heavy databases this overran prometheus.yml's sample_limit (10000) so the entire scrape was silently rejected, and the LIMIT tail was dropped without any aggregate row left behind — dashboard sums drifted.

Port the gen2 (gitlab.com/postgres-ai/pgwatch2) approach faithfully instead of reinventing it:

  • Read pg_stat_all_/pg_statio_all_ — keep pg_catalog, pg_toast and _timescaledb_internal visible (the _user_ variants hide them, which would blind us to autovacuum / TOAST / Timescale-chunk activity that this MR is designed to surface). Schema-cardinality is controlled by the top-N cap below, not by a hand-curated nspname LIKE pattern.
  • row_number() OVER (ORDER BY ) <= 100 per database. Rank tables by pg_class.relpages (catalog-cached, O(1) join — big tables are the interesting ones; n_live_tup+n_dead_tup starved big-but-static tables, and calling pg_total_relation_size(relid) per row blew past statement_timeout_seconds on extension-heavy clusters and could fail on concurrently dropped relations). Indexes/IO views rank by activity. Every window has a stable tiebreaker (schemaname, relname[, indexrelname]) so zero-activity ties don't churn top-N membership between scrapes.
  • UNION ALL an $other$ row that sums the tail so totals stay correct under the cap. HAVING count(*) > 0 (with explicit group by ()) suppresses the row when nothing was truncated. The $other$ sentinel cannot collide with a real Postgres identifier (the $ character is not legal in unquoted identifiers), avoiding duplicate-series risk.
  • Skip rows with no I/O activity in the statio views — most of the tail on schema-heavy DBs is dead-cold relations.

Metric names and exposed tag_* labels are unchanged so Dashboards 8–11 keep working. Adds compliance-vector tests that pin the pattern and an integration test that executes each rewritten SQL against a real PostgreSQL.

Closes #190 (closed)

Edited by Denis Morozov

Merge request reports

Loading