K002 Workload type

The aim

While K001 shows just one line with "totals", and K003 shows detailed stats, per each query group, report K002 aims to give an understanding of workload type: how classes of queries (SELECT vs INSERT vs DELETE, etc) are distributed, by any of cumulative metric – from the main ones as calls, total_time, rows to secondary, IO and CPU related.

How to get data

We detect class in simple and straightforward way – observing query's the first word. Of course, this approach has drawbacks – for example, it won't be really helpful if a lot of PLpgSQL functions are being used in the project, and among them there are some read-only, and there are modifying ones or even with DDL.

Nevertheless, this approach is good because it's simple. The query (it also distinguishes regular, non-blocking SELECTs from SELECT..FOR UPDATE and other blocking SELECTs)

with data as (
  select
    case lower(regexp_replace(s.query, '^\W*(\w+)\W+.*$',  '\1'))
      when 'select' then
        case
          when s.query ~* 'for\W+(no\W+key\W+)?update' then 'select ... for [no key] update'
          when s.query ~* 'for\W+(key\W+)?share' then 'select ... for [key] share'
          else 'select'
        end
      else lower(regexp_replace(s.query, '^\W*(\w+)\W+.*$',  '\1'))
    end as word,
    count(*) cnt,
    sum(calls) calls,
    sum(s.total_time) total_time,
    1000 * sum(k.user_time) user_time,
    1000 * sum(k.system_time) system_time,
    sum(reads_blks) reads_blks,
    sum(writes_blks) writes_blks
  from pg_stat_statements s
  left join pg_stat_kcache_detail k using (query)
  group by 1
)
select
  word,
  calls,
  round(100 * calls::numeric / (sum(calls) over())::numeric, 2) as "Calls, %",
  round(total_time::numeric, 2) total_time,
  round(100 * total_time::numeric / (sum(total_time) over())::numeric, 2) as "Time, %",
  reads_blks,
  round(100 * reads_blks::numeric / (sum(reads_blks) over())::numeric, 2) as "Read blocks, %",
  ... -- all other metrics
from data
order by total_time desc;

Acceptance criteria

As a DBA, I see the table with several rows (one row for INSERTs, one for DELETEs, etc). I can understand, what is the ratio (in % – see https://gitlab.com/postgres-ai-team/postgres-health-check/issues/179) of, say DELETEs, if we consider total_time. Or if we consider rows. Or shared_blks_read.

For each query class, I also see absolute values for each metric, as well as differentiated (per second, per call) values.

Edited by Nikolay Samokhvalov