Skip to content

Fix Postgres_exporter queries on PG14

After performing PG12 to PG14 upgrade test in the GPRD Main cluster - production#10855 (closed), I've identified the following postgres_exporter queries failing on PG14.

1. Error - pg_stat_statements.total_time does not exist

2023-05-08 22:24:56.697 UTC,"postgres_exporter","gitlabhq_production",120177,"127.0.0.1:59384",645976b6.1d571,8,"SELECT",2023-05-08 22:24:54 UTC,6/57,0,ERROR,42703,"column pg_stat_statements.total_time does not exist",,,,,,"SELECT
  pg_get_userbyid(userid) as user,
  pg_database.datname,
  pg_stat_statements.queryid,
  pg_stat_statements.calls,
  pg_stat_statements.total_time / 1000.0 as seconds_total,
  pg_stat_statements.rows,
  pg_stat_statements.blk_read_time / 1000.0 as block_read_seconds_total,
  pg_stat_statements.blk_write_time / 1000.0 as block_write_seconds_total
  FROM public.pg_stat_statements
  JOIN pg_database
    ON pg_database.oid = pg_stat_statements.dbid
  WHERE
    total_time > (
      SELECT percentile_cont(0.1)
        WITHIN GROUP (ORDER BY total_time)
        FROM public.pg_stat_statements
    )
  ORDER BY seconds_total DESC
  LIMIT 500

Postgres_exporter Query

pg_stat_statements:
  query: |
    SELECT
      pg_get_userbyid(userid) as user,
      pg_database.datname,
      pg_stat_statements.queryid,
      pg_stat_statements.calls,
      pg_stat_statements.total_time / 1000.0 as seconds_total,
      pg_stat_statements.rows,
      pg_stat_statements.blk_read_time / 1000.0 as block_read_seconds_total,
      pg_stat_statements.blk_write_time / 1000.0 as block_write_seconds_total
      FROM public.pg_stat_statements
      JOIN pg_database
        ON pg_database.oid = pg_stat_statements.dbid
      WHERE
        total_time > (
          SELECT percentile_cont(0.1)
            WITHIN GROUP (ORDER BY total_time)
            FROM public.pg_stat_statements
        )
      ORDER BY seconds_total DESC
      LIMIT 500
  metrics:
    - user:
        usage: "LABEL"
        description: "The user who executed the statement"
    - datname:
        usage: "LABEL"
        description: "The database in which the statement was executed"
    - queryid:
        usage: "LABEL"
        description: "Internal hash code, computed from the statement's parse tree"
    - calls:
        usage: "COUNTER"
        description: "Number of times executed"
    - seconds_total:
        usage: "COUNTER"
        description: "Total time spent in the statement, in seconds"
    - rows:
        usage: "COUNTER"
        description: "Total number of rows retrieved or affected by the statement"
    - block_read_seconds_total:
        usage: "COUNTER"
        description: "Total time the statement spent reading blocks, in seconds"
    - block_write_seconds_total:
        usage: "COUNTER"
        description: "Total time the statement spent writing blocks, in seconds"

PG12 - pg_stat_statements

gitlabhq_production=# \d pg_stat_statements
                    View "public.pg_stat_statements"
       Column        |       Type       | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
 userid              | oid              |           |          |
 dbid                | oid              |           |          |
 queryid             | bigint           |           |          |
 query               | text             |           |          |
 calls               | bigint           |           |          |
 total_time          | double precision |           |          |
 min_time            | double precision |           |          |
 max_time            | double precision |           |          |
 mean_time           | double precision |           |          |
 stddev_time         | double precision |           |          |
 rows                | bigint           |           |          |
 shared_blks_hit     | bigint           |           |          |
 shared_blks_read    | bigint           |           |          |
 shared_blks_dirtied | bigint           |           |          |
 shared_blks_written | bigint           |           |          |
 local_blks_hit      | bigint           |           |          |
 local_blks_read     | bigint           |           |          |
 local_blks_dirtied  | bigint           |           |          |
 local_blks_written  | bigint           |           |          |
 temp_blks_read      | bigint           |           |          |
 temp_blks_written   | bigint           |           |          |
 blk_read_time       | double precision |           |          |
 blk_write_time      | double precision |           |          |

PG14 - pg_stat_statements

                    View "public.pg_stat_statements"
       Column        |       Type       | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
 userid              | oid              |           |          |
 dbid                | oid              |           |          |
 toplevel            | boolean          |           |          |
 queryid             | bigint           |           |          |
 query               | text             |           |          |
 plans               | bigint           |           |          |
 total_plan_time     | double precision |           |          |
 min_plan_time       | double precision |           |          |
 max_plan_time       | double precision |           |          |
 mean_plan_time      | double precision |           |          |
 stddev_plan_time    | double precision |           |          |
 calls               | bigint           |           |          |
 total_exec_time     | double precision |           |          |
 min_exec_time       | double precision |           |          |
 max_exec_time       | double precision |           |          |
 mean_exec_time      | double precision |           |          |
 stddev_exec_time    | double precision |           |          |
 rows                | bigint           |           |          |
 shared_blks_hit     | bigint           |           |          |
 shared_blks_read    | bigint           |           |          |
 shared_blks_dirtied | bigint           |           |          |
 shared_blks_written | bigint           |           |          |
 local_blks_hit      | bigint           |           |          |
 local_blks_read     | bigint           |           |          |
 local_blks_dirtied  | bigint           |           |          |
 local_blks_written  | bigint           |           |          |
 temp_blks_read      | bigint           |           |          |
 temp_blks_written   | bigint           |           |          |
 blk_read_time       | double precision |           |          |
 blk_write_time      | double precision |           |          |
 wal_records         | bigint           |           |          |
 wal_fpi             | bigint           |           |          |
 wal_bytes           | numeric          |           |          |

PG14 pg_stat_statements does not have total_time column but it has total_plan_time and total_exec_time

2. Error - column pg_stat_ssl.compression does not exist

2023-05-08 22:24:55.065 UTC,"postgres_exporter","gitlabhq_production",120177,"127.0.0.1:59384",645976b6.1d571,4,"SELECT",2023-05-08 22:24:54 UTC,6/33,0,ERROR,42703,"column ""compression"" does not exist",,,,,,"SELECT pid, bits,
       CASE WHEN ssl THEN 1.0 ELSE 0.0 END AS active,
       CASE WHEN compression THEN 1.0 ELSE 0.0 END AS compression
       FROM pg_stat_ssl

Postgres_exporter Query

pg_stat_ssl:
  query: |
    SELECT pid, bits,
           CASE WHEN ssl THEN 1.0 ELSE 0.0 END AS active,
           CASE WHEN compression THEN 1.0 ELSE 0.0 END AS compression
           FROM pg_stat_ssl
  metrics:
    - pid:
        usage: "LABEL"
        description: "Process ID of a backend or WAL sender process"
    - active:
        usage: "GAUGE"
        description: "Boolean flag indicating if SSL is used on this connection"
    - bits:
        usage: "GAUGE"
        description: "Number of bits in the encryption algorithm is in use"
    - compression:
        usage: "GAUGE"
        description: "Boolean flag indicating if SSL compression is in use"

PG12 - pg_stat_ssl

gitlabhq_production=#  \d pg_stat_ssl
              View "pg_catalog.pg_stat_ssl"
    Column     |  Type   | Collation | Nullable | Default
---------------+---------+-----------+----------+---------
 pid           | integer |           |          |
 ssl           | boolean |           |          |
 version       | text    |           |          |
 cipher        | text    |           |          |
 bits          | integer |           |          |
 compression   | boolean |           |          |
 client_dn     | text    |           |          |
 client_serial | numeric |           |          |
 issuer_dn     | text    |           |          |

PG14 - pg_stat_ssl

gitlabhq_production=# \d pg_stat_ssl
              View "pg_catalog.pg_stat_ssl"
    Column     |  Type   | Collation | Nullable | Default
---------------+---------+-----------+----------+---------
 pid           | integer |           |          |
 ssl           | boolean |           |          |
 version       | text    |           |          |
 cipher        | text    |           |          |
 bits          | integer |           |          |
 client_dn     | text    |           |          |
 client_serial | numeric |           |          |
 issuer_dn     | text    |           |          |

PG14 pg_stat_ssl does not have compression column

3. Error - column pg_stat_wal_receiver.received_lsn does not exist

2023-05-08 22:25:00.211 UTC,"postgres_exporter","gitlabhq_production",277177,"127.0.0.1:57366",645976bb.43ab9,6,"SELECT",2023-05-08 22:24:59 UTC,4/38,0,ERROR,42703,"column ""received_lsn"" does not exist",,"Perhaps you meant to reference the column ""pg_stat_wal_receiver.received_tli"".",,,,"SELECT case status when 'stopped' then 0 when 'starting' then 1 when 'streaming' then 2 when 'waiting' then 3 when 'restarting' then 4 when 'stopping' then 5 else -1 end as status,
       (receive_start_lsn- '0/0') % (2^52)::bigint as receive_start_lsn,
       receive_start_tli,
       (received_lsn- '0/0') % (2^52)::bigint as received_lsn,
       received_tli,
       extract(epoch from last_msg_send_time) as last_msg_send_time,
       extract(epoch from last_msg_receipt_time) as last_msg_receipt_time,
       (latest_end_lsn - '0/0') % (2^52)::bigint as latest_end_lsn,
       extract(epoch from latest_end_time) as latest_end_time,
       substring(slot_name from 'repmgr_slot_([0-9]*)') as upstream_node,
       trim(both '''' from substring(conninfo from 'host=([^ ]*)')) as upstream_host,
       slot_name
  FROM pg_catalog.pg_stat_wal_receiver
",289,,"","client backend",,0

Postgres_exporter Query

pg_stat_wal_receiver:
  query: |
    SELECT case status when 'stopped' then 0 when 'starting' then 1 when 'streaming' then 2 when 'waiting' then 3 when 'restarting' then 4 when 'stopping' then 5 else -1 end as status,
           (receive_start_lsn- '0/0') % (2^52)::bigint as receive_start_lsn,
           receive_start_tli,
           (received_lsn- '0/0') % (2^52)::bigint as received_lsn,
           received_tli,
           extract(epoch from last_msg_send_time) as last_msg_send_time,
           extract(epoch from last_msg_receipt_time) as last_msg_receipt_time,
           (latest_end_lsn - '0/0') % (2^52)::bigint as latest_end_lsn,
           extract(epoch from latest_end_time) as latest_end_time,
           substring(slot_name from 'repmgr_slot_([0-9]*)') as upstream_node,
           trim(both '''' from substring(conninfo from 'host=([^ ]*)')) as upstream_host,
           slot_name
      FROM pg_catalog.pg_stat_wal_receiver
  metrics:
    - status:
        usage: "GAUGE"
        description: "Activity status of the WAL receiver process (0=stopped 1=starting 2=streaming 3=waiting 4=restarting 5=stopping)"
    - receive_start_lsn:
        usage: "COUNTER"
        description: "First transaction log position used when WAL receiver is started"
    - receive_start_tli:
        usage: "GAUGE"
        description: "First timeline number used when WAL receiver is started"
    - received_lsn:
        usage: "COUNTER"
        description: "Last transaction log position already received and flushed to disk, the initial value of this field being the first log position used when WAL receiver is started"
    - received_tli:
        usage: "GAUGE"
        description: "Timeline number of last transaction log position received and flushed to disk, the initial value of this field being the timeline number of the first log position used when WAL receiver is started"
    - last_msg_send_time:
        usage: "COUNTER"
        description: "Send time of last message received from origin WAL sender"
    - last_msg_receipt_time:
        usage: "COUNTER"
        description: "Receipt time of last message received from origin WAL sender"
    - latest_end_lsn:
        usage: "COUNTER"
        description: "Last transaction log position reported to origin WAL sender"
    - latest_end_time:
        usage: "COUNTER"
        description: "Time of last transaction log position reported to origin WAL sender"
    - upstream_node:
        usage: "GAUGE"
        description: "The repmgr node from the upstream slot name"
    - upstream_host:
        usage: "LABEL"
        description: "The upstream host this node is replicating from"
    - slot_name:
        usage: "LABEL"
        description: "The upstream slot_name this node is replicating from"

PG12 - pg_stat_wal_receiver

gitlabhq_production=# \d pg_stat_wal_receiver
                      View "pg_catalog.pg_stat_wal_receiver"
        Column         |           Type           | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
 pid                   | integer                  |           |          |
 status                | text                     |           |          |
 receive_start_lsn     | pg_lsn                   |           |          |
 receive_start_tli     | integer                  |           |          |
 received_lsn          | pg_lsn                   |           |          |
 received_tli          | integer                  |           |          |
 last_msg_send_time    | timestamp with time zone |           |          |
 last_msg_receipt_time | timestamp with time zone |           |          |
 latest_end_lsn        | pg_lsn                   |           |          |
 latest_end_time       | timestamp with time zone |           |          |
 slot_name             | text                     |           |          |
 sender_host           | text                     |           |          |
 sender_port           | integer                  |           |          |
 conninfo              | text                     |           |          |

PG14 - pg_stat_wal_receiver

gitlabhq_production=# \d pg_stat_wal_receiver
                      View "pg_catalog.pg_stat_wal_receiver"
        Column         |           Type           | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
 pid                   | integer                  |           |          |
 status                | text                     |           |          |
 receive_start_lsn     | pg_lsn                   |           |          |
 receive_start_tli     | integer                  |           |          |
 written_lsn           | pg_lsn                   |           |          |
 flushed_lsn           | pg_lsn                   |           |          |
 received_tli          | integer                  |           |          |
 last_msg_send_time    | timestamp with time zone |           |          |
 last_msg_receipt_time | timestamp with time zone |           |          |
 latest_end_lsn        | pg_lsn                   |           |          |
 latest_end_time       | timestamp with time zone |           |          |
 slot_name             | text                     |           |          |
 sender_host           | text                     |           |          |
 sender_port           | integer                  |           |          |
 conninfo              | text                     |           |          |

PG14 pg_stat_wal_receiver does not have received_lsn column

Acceptance criteria

  • Come up with a solution to resolve the postgres_exporter query error(s)
  • Apply the solution on the recently upgraded PG14 cluster(s)
Edited by Biren Shah