Change Request :: Add Vacuum metrics to Prometheus
C1
Production Change - Criticality 1Change Objective | Add Vacuum metrics to Prometheus |
---|---|
Change Type | ConfigurationChange (postgres_exporter configuration) |
Services Impacted | ~"Service:postgres,postgres_exporter" |
Change Team Members | @kadaffy @emanuel_ongres |
Change Severity | C1 |
Change Reviewer | @emanuel_ongres |
Tested in staging | Evidence or assertion the change was tested on staging environment |
Dry-run output | |
Due Date | |
Time tracking | |
Downtime Component |
Issue: 8494
Objetive
The purpose of this Change Request is to add vacuum metrics to Prometheus endpoint through configuration changes in each Patroni's postgres_exporter service.
Merge Request to omnibus-gitlab
sent gitlab-org/omnibus-gitlab#4887 (closed).
Nodes
The changes should be apply on all the patroni cluster nodes:
- patroni-01-db-gprd.c.gitlab-production.internal
- patroni-02-db-gprd.c.gitlab-production.internal
- patroni-03-db-gprd.c.gitlab-production.internal
- patroni-04-db-gprd.c.gitlab-production.internal
- patroni-05-db-gprd.c.gitlab-production.internal
- patroni-06-db-gprd.c.gitlab-production.internal
- patroni-07-db-gprd.c.gitlab-production.internal
- patroni-08-db-gprd.c.gitlab-production.internal
- patroni-09-db-gprd.c.gitlab-production.internal
- patroni-10-db-gprd.c.gitlab-production.internal
- patroni-11-db-gprd.c.gitlab-production.internal
- patroni-12-db-gprd.c.gitlab-production.internal
Steps for each cluster node
-
Add the following block to the
/opt/prometheus/postgres_exporter/queries.yaml
file.- Merge Request already requested gitlab-org/omnibus-gitlab!3771 (merged)
pg_vacuum_queue:
master: true
query: |
with table_opts as (
select
pg_class.oid,
relname,
nspname,
array_to_string(reloptions, '') as relopts
from pg_class
join pg_namespace ns on relnamespace = ns.oid
), vacuum_settings as (
select
oid,
relname,
nspname,
case
when relopts like '%autovacuum_vacuum_threshold%' then regexp_replace(relopts, '.*autovacuum_vacuum_threshold=([0-9.]+).*', E'\\1')::int8
else current_setting('autovacuum_vacuum_threshold')::int8
end as autovacuum_vacuum_threshold,
case
when relopts like '%autovacuum_vacuum_scale_factor%' then regexp_replace(relopts, '.*autovacuum_vacuum_scale_factor=([0-9.]+).*', E'\\1')::numeric
else current_setting('autovacuum_vacuum_scale_factor')::numeric
end as autovacuum_vacuum_scale_factor,
case when relopts ~ 'autovacuum_enabled=(false|off)' then false else true end as autovacuum_enabled
from table_opts
), p as (
select *
from pg_stat_progress_vacuum
)
select
--vacuum_settings.oid,
coalesce(
coalesce(nullif(vacuum_settings.nspname, 'public') || '.', '') || vacuum_settings.relname, -- current DB
format('[something in "%I"]', p.datname)
) as table_name,
round((100 * psat.n_dead_tup::numeric / nullif(pg_class.reltuples, 0))::numeric, 2) as dead_tup_pct,
pg_class.reltuples::numeric,
psat.n_dead_tup,
(relpages::bigint*8*1024) AS table_size_bytes,
'vt: ' || vacuum_settings.autovacuum_vacuum_threshold
|| ', vsf: ' || vacuum_settings.autovacuum_vacuum_scale_factor
|| case when not autovacuum_enabled then ', DISABLED' else ', enabled' end as "effective_settings",
case
when last_autovacuum > coalesce(last_vacuum, '0001-01-01') then left(last_autovacuum::text, 19)
when last_vacuum is not null then left(last_vacuum::text, 19)
else null
end as "last_vacuumed",
case
when last_autovacuum > coalesce(last_vacuum, '0001-01-01') then 'autovacuum'
when last_vacuum is not null then 'manual'
else null
end as "type",
coalesce(p.phase, 'in queue') as status,
p.pid as pid,
case
when a.query ~ '^autovacuum.*to prevent wraparound' then 'wraparound'
when a.query ~ '^vacuum' then 'user'
when a.pid is null then null
else 'regular'
end as mode,
case when a.pid is null then null else coalesce(wait_event_type ||'.'|| wait_event, 'f') end as waiting,
round(100.0 * p.heap_blks_scanned / nullif(p.heap_blks_total, 0), 1) AS scanned_pct,
round(100.0 * p.heap_blks_vacuumed / nullif(p.heap_blks_total, 0), 1) AS vacuumed_pct,
p.index_vacuum_count,
case
when psat.relid is not null and p.relid is not null then
(select count(*) from pg_index where indrelid = psat.relid)
else null
end as index_count
from pg_stat_all_tables psat
join pg_class on psat.relid = pg_class.oid
join vacuum_settings on pg_class.oid = vacuum_settings.oid
full outer join p on p.relid = psat.relid and p.datname = current_database()
left join pg_stat_activity a using (pid)
where
pg_class.relpages >= 8
and psat.relid is null
or autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor::numeric * pg_class.reltuples) < psat.n_dead_tup
metrics:
- table_name:
usage: "LABEL"
description: "Table name"
- dead_tup_pct:
usage: "GAUGE"
description: "Estimated dead tuples percent"
- reltubles:
usage: "GAUGE"
description: "Number of tuples in table"
- n_dead_tup:
usage: "GAUGE"
description: "Estimated number of dead tuples"
- table_size_bytes:
usage: "GAUGE"
description: "Estimated table size"
- effective_settings:
usage: "LABEL"
description: "Autovacuums settings"
- last_vacuumed:
usage: "LABEL"
description: "Last time at which this table was vacuumed"
- type:
usage: "LABEL"
description: "Last vacuum type"
- status:
usage: "LABEL"
description: "Vacuum actual status"
- pid:
usage: "GAUGE"
description: "Vacuum process id"
- mode:
usage: "LABEL"
description: "Type of vacuum executed"
- waiting:
usage: "LABEL"
description: "Vacuum queue status"
- scanned_pct:
usage: "GAUGE"
description: "Estimated rows scanned percent"
- vacuumed_pct:
usage: "GAUGE"
description: "Estimated vacuumed rows percent"
- index_vacuum_count:
usage: "GAUGE"
description: "Number of index to vacuum"
- index_count:
usage: "GAUGE"
description: "Number of table indexes"
-
Restart postgres_exporter service
sudo service postgres_exporter restart
Checks
-
Check if the services start successfully
- Check the postgres_exporter process
ps -aux | grep postgres_exporter
- Check log file
/var/log/prometheus/postgres_exporter
if theres any errors related with the change
-
Check metrics from prometheus
- In the Expression text-box check if these metrics are listed:
- pg_stats_vacuums_dead_tup_pct
- pg_stats_vacuums_index_count
- pg_stats_vacuums_index_vacuum_count
- pg_stats_vacuums_n_dead_tup
- pg_stats_vacuums_pid
- pg_stats_vacuums_reltuples
- pg_stats_vacuums_scanned_pct
- pg_stats_vacuums_table_size_bytes
- pg_stats_vacuums_vacuumed_pct
Rollback changes
- Revert gitlab-org/omnibus-gitlab#4887 (closed)
- Restart postgres_exporter services
sudo service postgres_exporter restart