Fix Postgres_exporter queries on PG14
After performing PG12 to PG14 upgrade test in the GPRD Main cluster - production#10855 (closed), I've identified the following postgres_exporter
queries failing on PG14.
1. Error - pg_stat_statements.total_time does not exist
2023-05-08 22:24:56.697 UTC,"postgres_exporter","gitlabhq_production",120177,"127.0.0.1:59384",645976b6.1d571,8,"SELECT",2023-05-08 22:24:54 UTC,6/57,0,ERROR,42703,"column pg_stat_statements.total_time does not exist",,,,,,"SELECT
pg_get_userbyid(userid) as user,
pg_database.datname,
pg_stat_statements.queryid,
pg_stat_statements.calls,
pg_stat_statements.total_time / 1000.0 as seconds_total,
pg_stat_statements.rows,
pg_stat_statements.blk_read_time / 1000.0 as block_read_seconds_total,
pg_stat_statements.blk_write_time / 1000.0 as block_write_seconds_total
FROM public.pg_stat_statements
JOIN pg_database
ON pg_database.oid = pg_stat_statements.dbid
WHERE
total_time > (
SELECT percentile_cont(0.1)
WITHIN GROUP (ORDER BY total_time)
FROM public.pg_stat_statements
)
ORDER BY seconds_total DESC
LIMIT 500
Postgres_exporter Query
pg_stat_statements:
query: |
SELECT
pg_get_userbyid(userid) as user,
pg_database.datname,
pg_stat_statements.queryid,
pg_stat_statements.calls,
pg_stat_statements.total_time / 1000.0 as seconds_total,
pg_stat_statements.rows,
pg_stat_statements.blk_read_time / 1000.0 as block_read_seconds_total,
pg_stat_statements.blk_write_time / 1000.0 as block_write_seconds_total
FROM public.pg_stat_statements
JOIN pg_database
ON pg_database.oid = pg_stat_statements.dbid
WHERE
total_time > (
SELECT percentile_cont(0.1)
WITHIN GROUP (ORDER BY total_time)
FROM public.pg_stat_statements
)
ORDER BY seconds_total DESC
LIMIT 500
metrics:
- user:
usage: "LABEL"
description: "The user who executed the statement"
- datname:
usage: "LABEL"
description: "The database in which the statement was executed"
- queryid:
usage: "LABEL"
description: "Internal hash code, computed from the statement's parse tree"
- calls:
usage: "COUNTER"
description: "Number of times executed"
- seconds_total:
usage: "COUNTER"
description: "Total time spent in the statement, in seconds"
- rows:
usage: "COUNTER"
description: "Total number of rows retrieved or affected by the statement"
- block_read_seconds_total:
usage: "COUNTER"
description: "Total time the statement spent reading blocks, in seconds"
- block_write_seconds_total:
usage: "COUNTER"
description: "Total time the statement spent writing blocks, in seconds"
PG12 - pg_stat_statements
gitlabhq_production=# \d pg_stat_statements
View "public.pg_stat_statements"
Column | Type | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
userid | oid | | |
dbid | oid | | |
queryid | bigint | | |
query | text | | |
calls | bigint | | |
total_time | double precision | | |
min_time | double precision | | |
max_time | double precision | | |
mean_time | double precision | | |
stddev_time | double precision | | |
rows | bigint | | |
shared_blks_hit | bigint | | |
shared_blks_read | bigint | | |
shared_blks_dirtied | bigint | | |
shared_blks_written | bigint | | |
local_blks_hit | bigint | | |
local_blks_read | bigint | | |
local_blks_dirtied | bigint | | |
local_blks_written | bigint | | |
temp_blks_read | bigint | | |
temp_blks_written | bigint | | |
blk_read_time | double precision | | |
blk_write_time | double precision | | |
PG14 - pg_stat_statements
View "public.pg_stat_statements"
Column | Type | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
userid | oid | | |
dbid | oid | | |
toplevel | boolean | | |
queryid | bigint | | |
query | text | | |
plans | bigint | | |
total_plan_time | double precision | | |
min_plan_time | double precision | | |
max_plan_time | double precision | | |
mean_plan_time | double precision | | |
stddev_plan_time | double precision | | |
calls | bigint | | |
total_exec_time | double precision | | |
min_exec_time | double precision | | |
max_exec_time | double precision | | |
mean_exec_time | double precision | | |
stddev_exec_time | double precision | | |
rows | bigint | | |
shared_blks_hit | bigint | | |
shared_blks_read | bigint | | |
shared_blks_dirtied | bigint | | |
shared_blks_written | bigint | | |
local_blks_hit | bigint | | |
local_blks_read | bigint | | |
local_blks_dirtied | bigint | | |
local_blks_written | bigint | | |
temp_blks_read | bigint | | |
temp_blks_written | bigint | | |
blk_read_time | double precision | | |
blk_write_time | double precision | | |
wal_records | bigint | | |
wal_fpi | bigint | | |
wal_bytes | numeric | | |
PG14 pg_stat_statements does not have total_time column but it has total_plan_time and total_exec_time
2. Error - column pg_stat_ssl.compression does not exist
2023-05-08 22:24:55.065 UTC,"postgres_exporter","gitlabhq_production",120177,"127.0.0.1:59384",645976b6.1d571,4,"SELECT",2023-05-08 22:24:54 UTC,6/33,0,ERROR,42703,"column ""compression"" does not exist",,,,,,"SELECT pid, bits,
CASE WHEN ssl THEN 1.0 ELSE 0.0 END AS active,
CASE WHEN compression THEN 1.0 ELSE 0.0 END AS compression
FROM pg_stat_ssl
Postgres_exporter Query
pg_stat_ssl:
query: |
SELECT pid, bits,
CASE WHEN ssl THEN 1.0 ELSE 0.0 END AS active,
CASE WHEN compression THEN 1.0 ELSE 0.0 END AS compression
FROM pg_stat_ssl
metrics:
- pid:
usage: "LABEL"
description: "Process ID of a backend or WAL sender process"
- active:
usage: "GAUGE"
description: "Boolean flag indicating if SSL is used on this connection"
- bits:
usage: "GAUGE"
description: "Number of bits in the encryption algorithm is in use"
- compression:
usage: "GAUGE"
description: "Boolean flag indicating if SSL compression is in use"
PG12 - pg_stat_ssl
gitlabhq_production=# \d pg_stat_ssl
View "pg_catalog.pg_stat_ssl"
Column | Type | Collation | Nullable | Default
---------------+---------+-----------+----------+---------
pid | integer | | |
ssl | boolean | | |
version | text | | |
cipher | text | | |
bits | integer | | |
compression | boolean | | |
client_dn | text | | |
client_serial | numeric | | |
issuer_dn | text | | |
PG14 - pg_stat_ssl
gitlabhq_production=# \d pg_stat_ssl
View "pg_catalog.pg_stat_ssl"
Column | Type | Collation | Nullable | Default
---------------+---------+-----------+----------+---------
pid | integer | | |
ssl | boolean | | |
version | text | | |
cipher | text | | |
bits | integer | | |
client_dn | text | | |
client_serial | numeric | | |
issuer_dn | text | | |
PG14 pg_stat_ssl does not have compression column
3. Error - column pg_stat_wal_receiver.received_lsn does not exist
2023-05-08 22:25:00.211 UTC,"postgres_exporter","gitlabhq_production",277177,"127.0.0.1:57366",645976bb.43ab9,6,"SELECT",2023-05-08 22:24:59 UTC,4/38,0,ERROR,42703,"column ""received_lsn"" does not exist",,"Perhaps you meant to reference the column ""pg_stat_wal_receiver.received_tli"".",,,,"SELECT case status when 'stopped' then 0 when 'starting' then 1 when 'streaming' then 2 when 'waiting' then 3 when 'restarting' then 4 when 'stopping' then 5 else -1 end as status,
(receive_start_lsn- '0/0') % (2^52)::bigint as receive_start_lsn,
receive_start_tli,
(received_lsn- '0/0') % (2^52)::bigint as received_lsn,
received_tli,
extract(epoch from last_msg_send_time) as last_msg_send_time,
extract(epoch from last_msg_receipt_time) as last_msg_receipt_time,
(latest_end_lsn - '0/0') % (2^52)::bigint as latest_end_lsn,
extract(epoch from latest_end_time) as latest_end_time,
substring(slot_name from 'repmgr_slot_([0-9]*)') as upstream_node,
trim(both '''' from substring(conninfo from 'host=([^ ]*)')) as upstream_host,
slot_name
FROM pg_catalog.pg_stat_wal_receiver
",289,,"","client backend",,0
Postgres_exporter Query
pg_stat_wal_receiver:
query: |
SELECT case status when 'stopped' then 0 when 'starting' then 1 when 'streaming' then 2 when 'waiting' then 3 when 'restarting' then 4 when 'stopping' then 5 else -1 end as status,
(receive_start_lsn- '0/0') % (2^52)::bigint as receive_start_lsn,
receive_start_tli,
(received_lsn- '0/0') % (2^52)::bigint as received_lsn,
received_tli,
extract(epoch from last_msg_send_time) as last_msg_send_time,
extract(epoch from last_msg_receipt_time) as last_msg_receipt_time,
(latest_end_lsn - '0/0') % (2^52)::bigint as latest_end_lsn,
extract(epoch from latest_end_time) as latest_end_time,
substring(slot_name from 'repmgr_slot_([0-9]*)') as upstream_node,
trim(both '''' from substring(conninfo from 'host=([^ ]*)')) as upstream_host,
slot_name
FROM pg_catalog.pg_stat_wal_receiver
metrics:
- status:
usage: "GAUGE"
description: "Activity status of the WAL receiver process (0=stopped 1=starting 2=streaming 3=waiting 4=restarting 5=stopping)"
- receive_start_lsn:
usage: "COUNTER"
description: "First transaction log position used when WAL receiver is started"
- receive_start_tli:
usage: "GAUGE"
description: "First timeline number used when WAL receiver is started"
- received_lsn:
usage: "COUNTER"
description: "Last transaction log position already received and flushed to disk, the initial value of this field being the first log position used when WAL receiver is started"
- received_tli:
usage: "GAUGE"
description: "Timeline number of last transaction log position received and flushed to disk, the initial value of this field being the timeline number of the first log position used when WAL receiver is started"
- last_msg_send_time:
usage: "COUNTER"
description: "Send time of last message received from origin WAL sender"
- last_msg_receipt_time:
usage: "COUNTER"
description: "Receipt time of last message received from origin WAL sender"
- latest_end_lsn:
usage: "COUNTER"
description: "Last transaction log position reported to origin WAL sender"
- latest_end_time:
usage: "COUNTER"
description: "Time of last transaction log position reported to origin WAL sender"
- upstream_node:
usage: "GAUGE"
description: "The repmgr node from the upstream slot name"
- upstream_host:
usage: "LABEL"
description: "The upstream host this node is replicating from"
- slot_name:
usage: "LABEL"
description: "The upstream slot_name this node is replicating from"
PG12 - pg_stat_wal_receiver
gitlabhq_production=# \d pg_stat_wal_receiver
View "pg_catalog.pg_stat_wal_receiver"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
pid | integer | | |
status | text | | |
receive_start_lsn | pg_lsn | | |
receive_start_tli | integer | | |
received_lsn | pg_lsn | | |
received_tli | integer | | |
last_msg_send_time | timestamp with time zone | | |
last_msg_receipt_time | timestamp with time zone | | |
latest_end_lsn | pg_lsn | | |
latest_end_time | timestamp with time zone | | |
slot_name | text | | |
sender_host | text | | |
sender_port | integer | | |
conninfo | text | | |
PG14 - pg_stat_wal_receiver
gitlabhq_production=# \d pg_stat_wal_receiver
View "pg_catalog.pg_stat_wal_receiver"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
pid | integer | | |
status | text | | |
receive_start_lsn | pg_lsn | | |
receive_start_tli | integer | | |
written_lsn | pg_lsn | | |
flushed_lsn | pg_lsn | | |
received_tli | integer | | |
last_msg_send_time | timestamp with time zone | | |
last_msg_receipt_time | timestamp with time zone | | |
latest_end_lsn | pg_lsn | | |
latest_end_time | timestamp with time zone | | |
slot_name | text | | |
sender_host | text | | |
sender_port | integer | | |
conninfo | text | | |
PG14 pg_stat_wal_receiver does not have received_lsn column
Acceptance criteria
-
Come up with a solution to resolve the postgres_exporter
query error(s) -
Apply the solution on the recently upgraded PG14 cluster(s)