Parse query plans instead of sql statements

There are two areas for improvement: a) identify queries with bad estimates in the plan instead of just long running queries b) actual conditions are hard to parse out of the query text

One idea to improve both could be to parse query plans instead of sql statements. This way, one could compare "Plan Rows" with "Actual Rows" of the individual nodes and identify places with bad estimates that could benefit from (extended) statistics. Further, that would allow to parse the affected tables and columns more easily by referring to "Relation Name" and "Filter" or "Index Cond" attributes which are already pre-parsed and normalised (ref. issue #1). The query plans can be retrieved in JSON format which further eases parsing. Just experiment with EXPLAIN (ANALYZE, FORMAT JSON) <your test query> to see what I mean.

Unfortunately, it's not really easy to automagically access the query plans that were used in runtime - they are not even stable per query. But there are ways:

The auto_explain extension allows to write the plans to the log during execution if the match certain conditions.
Then configure log_destination to csvlog and use file_fdw to access the logfile at runtime.

You can even match the plan to a row in the pg_stat_statements view using it's query_id. In PostgreSQL 14 when activating the compute_query_id GUC or in earlier versions using an extension like pg_logqueryid.

Going further - or another way, it might be possible to parse the query (even with it's placeholders) with libpg_query (or one of it's wrappers like pglast or psqlparse)

Unfortunately, this requires a lot more extensions and configuration and is therefore not as 'cloud-friendly' like the current approach of just reading the pg_stat_statements view.