fix: filter pg_stat_statements by calls AND exec_time to cut long tail (!220) · Merge requests · PostgresAI / postgresai

Problem

The pg_stat_statements metrics in both config/pgwatch-prometheus/metrics.yml and config/pgwatch-postgres/metrics.yml export every queryid from pg_stat_statements with no filtering.

On databases with high query diversity (e.g., Supabase with inngest trace comments creating unique queryids), this produces hundreds of thousands of unique queryids. Observed on production:

948,916 unique queryids in pgss_queryid_queries table (1.3 GB)
Flask backend (/query_info_metrics) OOM crash loop every ~5 minutes trying to process all queryids
VictoriaMetrics unable to scrape query info → Dashboard 2 panels empty
pgwatch_query_info metric never populated

Fix

Add WHERE calls >= 3 AND total_exec_time >= 1000 filter to both metrics files:

Prometheus sink (pgwatch-prometheus/metrics.yml): filters on calls >= 3 AND exec_time_total >= 1000 for both PG 11 and PG 17 SQL variants
PostgreSQL sink (pgwatch-postgres/metrics.yml): filters on calls >= 3 AND total_exec_time >= 1000

Using AND (not OR) ensures only queries that are both frequently called AND have significant execution time are exported, cutting the long tail of one-off or trivial queries.

Note: Previous approach used LIMIT 100 which is replaced by this threshold-based filter. The threshold approach is preferred because it adapts to the actual workload rather than imposing an arbitrary cap.

Production validation

Applied on production (34.203.111.251) on 2026-02-16:

Truncated pgss_queryid_queries table (cleared 948K stale rows)
Patched both metrics.yml files
Restarted pgwatch-postgres, pgwatch-prometheus, flask backend
Result: table repopulated with 851 rows (down from 948,916), flask backend stable (no more OOM kills)

Closes #143 (closed)

Edited Feb 16, 2026 by Nikolay Samokhvalov

fix: filter pg_stat_statements by calls AND exec_time to cut long tail

Problem

Fix

Production validation

Closes #143 (closed)

Merge request reports