fix(metrics): extend !262 top-N + $other$ to remaining per-relation metrics

Summary

Port the gen2 "rank-then-cap + '$other$' aggregate row" pattern that !262 (merged) ships for the pg_stat_all_* / pg_statio_all_* family to the remaining per-relation metrics that were left out of !262 (merged)'s scope. On managed-Postgres targets with non-trivial table counts (e.g. Supabase free tier with 242 tables in non-system schemas) these were the dominant per-scrape sample-count contributors and were driving the ~one-per-30s exceeds sample_limit=10000 rejections documented in #195 (closed).

What this changes

config/pgwatch-prometheus/metrics.yml (production metrics) and tests/compliance_vectors/test_mr262_pgwatch_topn_integration.py (integration coverage extended to the new metrics) — no Go code, no Dockerfile, no other configs.

Metrics ported (10 metric blocks, 11 SQL variants):

Metric Poll Was Now Effective on 242-table Supabase
table_stats (PG11) 30s ORDER BY size LIMIT 300, no aggregate Top-100 + '$other$' ranked by total_relation_size_b ~4 840 → ~2 000 + other (largest single saving)
table_stats (PG16) 30s same as PG11 + inner LIMIT 1500 pre-filter same as PG11 + inner LIMIT 1500 retained (perf optimisation) ~4 840 → ~2 000 + other
table_size_detailed 30s flat LIMIT 1000, no aggregate Top-100 + '$other$' ranked by total_relation_size_b ~1 200 → ~500 + other
pg_class 30s flat LIMIT 10000, no aggregate Top-100 + '$other$' ranked by relpages ~500-1700 → ~200 + other
pg_total_relation_size 30s flat LIMIT 5000, no aggregate Top-100 + '$other$' ranked by total relation size ~242 → 100 + other
pg_table_bloat 7200s flat LIMIT 1000, no aggregate Top-100 + '$other$' ranked by bloat_pct (with is_na = 0 preferred) ~2 400 → ~1 000 + other (once polled)
pg_btree_bloat 7200s flat LIMIT 1000, no aggregate Top-100 + '$other$' ranked by bloat_pct ~2 000 → ~1 000 + other (once polled)
pg_invalid_indexes 7200s row_number() over () (unstable) + flat LIMIT 1000 Top-100 + '$other$' ranked by index_size_bytes per-DB (typically 0 on healthy clusters)
redundant_indexes 10800s row_number() over () inside CTE + flat LIMIT 1000 Top-100 + '$other$' ranked by index_size_bytes (preserves duplicate-alias output that dashboards rely on) per-DB
unused_indexes 7200s flat LIMIT 1000 Top-100 + '$other$' ranked by index_size_bytes within the idx_scan=0 AND idx_is_btree filter per-DB
rarely_used_indexes 10800s 3-way UNION ALL of reason categories + ORDER BY grp, size desc LIMIT 1000 3-way UNION + Top-100 + '$other$' (combined; $other$ row's tag_reason = '$other$' to distinguish from real reason categories) per-DB

Pattern (identical shape to !262 (merged)'s four metrics):

WITH ranked AS (
  SELECT row_number() OVER (ORDER BY <relevance> DESC NULLS LAST,
                            <stable tiebreaker(s)>) AS rownum,
         <source columns>
  FROM <source view(s) / sub-CTE>
  [WHERE <existing filters>]
)
SELECT <columns>
FROM ranked
WHERE rownum <= 100
UNION ALL
SELECT '$other$', ..., <aggregates>
FROM ranked
WHERE rownum > 100
GROUP BY ()
HAVING count(*) > 0

Aggregate semantics on the '$other$' row

  • Sum for size and counter columns (*_size_b, *_size_bytes, seq_scan, n_tup_*, vacuum_count, etc.)
  • Min for "seconds since" timestamps (most recent maintenance across the tail — not 0)
  • Max for tx_freeze_age / mxid_freeze_age (oldest still-alive xid is the risk)
  • Weighted-avg for bloat percentages: 100 * sum(greatest(relpages - est_pages, 0)) / sum(relpages) — equivalent to "what fraction of the tail's total page count is wasted"
  • Avg for fillfactor
  • Recomputed tag_*_cardinality_mb labels from the summed bytes
  • '$other$' literal for tag columns that can't aggregate (schema names, index definitions, etc.)
  • max(is_na) propagates "tail has unreliable rows" upward

The '$other$' sentinel starts with $, which is illegal as the first character of an unquoted Postgres identifier, so it cannot collide with a real schema/table/index name.

Integration test coverage

tests/compliance_vectors/test_mr262_pgwatch_topn_integration.py was extended (originally MR-262 scope, now also covers MR-267):

  • seeded_cur_mr267 fixture — seeds 110 user tables on a live cluster and adds the postgres_ai.pg_statistic helper view that the bloat metrics depend on.
  • test_pgwatch_topn_new_metrics_parse_and_cap + test_pgwatch_topn_new_metrics_other_bucket_present — for the 4 FULL_CROSS_CHECK metrics (pg_total_relation_size, pg_class, table_size_detailed, table_stats × PG11/PG16 variants): SQL parses, ≤101 rows, exactly one $other$ row, exactly 100 top-N rows.
  • test_pgwatch_topn_new_metrics_other_bucket_aggregates_tail — for the same 4 FULL_CROSS_CHECK metrics: top-N + $other$ per-column sum equals the source-view total (catches data loss in the tail aggregation).
  • test_pgwatch_topn_table_stats_other_bucket_min_max_semantics — runtime cross-check of min()/max() aggregate semantics on the table_stats $other$ row: seconds_since_last_vacuum / _analyze match independent tail min(); tx_freeze_age / mxid_freeze_age match independent tail max(). Catches a min/max swap that would invert "most recently maintained" or understate wraparound risk.
  • test_pgwatch_topn_new_metrics_syntax_only + test_pgwatch_topn_syntax_only_no_other_when_below_cap — for the 6 SYNTAX_ONLY metrics (bloat + index-list): SQL parses on a seeded cluster, schema-tag/numeric column names present (catches UNION ALL alias swaps), and HAVING count(*) > 0 correctly suppresses the $other$ row when the source set has ≤100 qualifying rows.

The MR-262 metric count assertion (len(METRIC_KEYS) == 4) is unchanged — !267 (merged) metrics are tested via the new parametrize sets above rather than added to that legacy constant.

Empirical validation

Each ported SQL was validated against a live Postgres 17 (target-db on denis-postgresai-015-test) before the test extension was written:

  • All parse cleanly with psql -v ON_ERROR_STOP=1.
  • pg_total_relation_size: 68 rows (under cap, no $other$ row emitted — HAVING count(*) > 0 correctly suppresses).
  • With cap deliberately tightened to 10 for validation, the $other$ row's coalesce(sum(bytes), 0) exactly equals the sum of the omitted tail (verified: top-10 sum + other = total sum).
  • table_stats PG11: 101 rows = 100 top + 1 $other$.
  • table_stats PG16: 101 rows.
  • Bloat metrics use postgres_ai.pg_statistic (a pgwatch helper view not on the stock test target); substituting pg_stats for syntactic validation parses cleanly and returns 0 rows under the existing >1 MiB filter as expected.

End-to-end soak on a freshly-installed rc4 against Supabase (aws-0-eu-west-1.pooler.supabase.com, 242 tables): per-scrape sample count drops from ~12 100 (baseline, ~70% rejection rate) to 7 500–9 000 (1 000–2 500 of headroom under the 10 000 cap), zero exceeds sample_limit rejections over a 5-minute window. See the validation comment thread for full method + per-metric row counts.

What this does NOT do

  • Does not change sample_limit in config/prometheus/prometheus.yml. Whether the post-fix steady-state on real targets fits comfortably under 10 000 is a sizing question separate from the cap-shape work this MR does.
  • Does not address the drain bug (#198, MR !266). Those are independent — even after the drain wipe is removed, the per-relation fan-out on managed-Postgres targets would still exceed 10 000 without this MR's caps.

Test plan

  • Each ported SQL parses against PG17 (psql -v ON_ERROR_STOP=1)
  • Top-100 + $other$ row count correct on small targets (no $other$ when under cap)
  • $other$ aggregate sums verified equal to tail sum on pg_total_relation_size (validated with cap=10 against 68-row target)
  • YAML still parses (python3 -c "import yaml; yaml.safe_load(open(...))") after each commit
  • Integration test file extended to cover the 10 new metrics (FULL_CROSS_CHECK, SYNTAX_ONLY, tail-sum cross-check, HAVING-guard, and min/max semantics for table_stats)
  • Full test file (37 tests) passes against a live Postgres 18 cluster
  • End-to-end soak on rc4 against 242-table Supabase: per-scrape sample count drops from ~12 100 to 7 500–9 000, zero exceeds sample_limit rejections over 5+ min

Refs

🤖 Generated with Claude Code

Edited by Denis Morozov

Merge request reports

Loading