K001 Globally aggregated query metrics

The aim

The report is to show for the observed period (between two snapshots of pg_stat_statements/kcache), total values aggregated for all observed query groups in pg_stat_statements/kcache:

  • how many statements calls was detected;
  • what was aggregated total_time (sum(total_time));
  • how many rows were processed,

etc -- all main cumulative metrics from pg_stat_statements/kcache.

The problem of error in observations

The problem here is that in general, two snapshots contain different sets of query groups. And the less pg_stat_statements.max is, the more this difference might be. If we are aggregating metrics for query groups, this problem directly affects the resulting values showed in the report.

The methodology to estimate the error (observing calls and total_time metrics) is described here: https://gitlab.com/postgres-ai-team/postgres-health-check/issues/179#warning-on-possible-error-related-to-difference-in-sets-of-query-groups

Acceptance criteria

As a DBA I see the report K001, containing the table with just 1 row and many columns, from which I can conclude:

  • how many statement calls were detected between two moments in time when pg_stat_statements/kcache snapshots were created, how many calls per second in average,
  • what is total total_time for all queries, how many milliseconds per second, how many milliseconds per statement,
  • how many rows were processed, how many per second, how many per statement,
  • etc.

Additionally, I can understand, what is the estimated error related to difference difference in sets of query groups of two snapshots observed. If it's small enough (<10%), I can conclude that presented values are trustworthy.

Edited by Nikolay Samokhvalov