lock_waits metric: duplicate Prometheus series under concurrent same-table contention (session pid emitted as value, not tag) (#199) · Issues · PostgresAI / postgresai

lock_waits metric: duplicate Prometheus series under concurrent same-table contention (session pid emitted as value, not tag)

## Problem `lock_waits` metric emits two distinct lock-wait events as **duplicate Prometheus series** when both events share the same `(blocked_user, blocked_appname, blocked_mode, blocked_locktype, blocked_table, blocked_query_id, blocker_user, blocker_appname, blocker_mode, blocker_locktype, blocker_table, blocker_query_id)` tuple but involve different sessions (different `blocked_pid` / `blocker_pid`). The Prometheus client library in the pgwatch-prometheus sink rejects this: ``` [ERROR] [sink:prometheus] error gathering metrics: 4 error(s) occurred: * collected metric "pgwatch_lock_waits_blocked_ms" {... label:{name:"blocked_query_id" value:"2580257173810903337"} label:{name:"blocker_query_id" value:"4593154642546346557"} label:{name:"blocked_table" value:"loadtest.pgbench_accounts"} ...} was collected before with the same name and label values * collected metric "pgwatch_lock_waits_blocked_pid" ... was collected before with the same name and label values * collected metric "pgwatch_lock_waits_blocker_pid" ... was collected before with the same name and label values * collected metric "pgwatch_lock_waits_blocker_tx_ms" ... was collected before with the same name and label values ``` Effect: the entire 4 lock_waits series for that scrape are silently dropped from the response. (Independent metrics in the same scrape still land — this is per-metric not per-scrape.) ## Root cause In the rc.4 SQL (`config/pgwatch-prometheus/metrics.yml`, `lock_waits` block ~line 878), session pids are exported as **integer metric values** rather than tags: ```sql select blocked.pid as blocked_pid, -- integer → becomes a metric VALUE ... blocker.pid as blocker_pid, -- integer → becomes a metric VALUE ... ``` But the pgwatch-prometheus sink ([`internal/sinks/prometheus.go::MetricStoreMessageToPromMetrics`](https://github.com/cybertec-postgresql/pgwatch/blob/v3.7.0/internal/sinks/prometheus.go#L191)) only labels are: - columns whose name starts with `tag_`, OR - any `string`-typed column Integer columns become metric values, not labels. So `blocked_pid` and `blocker_pid` don't disambiguate the series. Meanwhile the SQL has 12+ other columns that already do become labels (some explicitly via `tag_*`, others implicitly because their type is `text` — e.g. `blocked_mode`, `blocked_locktype` are strings from `pg_locks.mode` / `pg_locks.locktype`). So when two concurrent transactions land in a lock-wait pair against the same table, with the same query templates and the same mode/locktype pairing, they produce identical label sets → duplicate-series collision. ## Reproduction 1. Stock `postgresai mon local-install --tag 0.15.0-rc.4 --db-url <supabase-pooler-url>` install monitoring a Supabase target with the postgres-ai-mon role. 2. Connect as a writable role (e.g. project `postgres` role) and run a TPC-B-style write workload that creates concurrent same-table contention. Reproduced cleanly with: ```bash PGOPTIONS="-c search_path=loadtest,public" pgbench \ -h <pooler> -U postgres.<project-ref> -d postgres \ -i -s 1 --no-vacuum --partitions=0 PGOPTIONS="-c search_path=loadtest,public" pgbench \ -h <pooler> -U postgres.<project-ref> -d postgres \ -n -c 5 -j 2 -T 600 ``` 3. Within ~30 s — 2 min of starting the write workload, `docker logs pgwatch-prometheus | grep "collected metric"` shows the duplicate-series errors. (Verified empirically on `78.47.229.109` during the validation of !267.) ## Suggested fix Promote `blocked_pid` and `blocker_pid` to tags (labels) by renaming with the `tag_` prefix and casting to text in the SQL — `pid` is identity, not a time-series value: ```diff select - blocked.pid as blocked_pid, + blocked.pid::text as tag_blocked_pid, ... - blocker.pid as blocker_pid, + blocker.pid::text as tag_blocker_pid, ``` `blocked_pid` / `blocker_pid` as numeric gauges aren't dashboard-relevant anyway — nobody graphs `rate(blocked_pid)`; the useful question is "which session pids are blocking which." Moving them to labels makes them disambiguators *and* keeps them queryable from Grafana/PromQL. Same fix probably needs review for other metrics that expose pid as a value — at minimum `xmin_horizon_blockers` (haven't checked, just guessing from the pattern). ## Distinction from other open issues - **Not #195** (cap shape on per-relation metrics — !267 covers that). - **Not #198** (Prometheus sink drain wipe — !266 covers that). - Independent bug, only surfaces under concurrent-write load against the same target table. Likely latent on monitored production DBs that have any concurrent writers and lock contention. ## Refs - Discovered during empirical validation of !267 against the Supabase pooler under `pgbench` TPC-B load on `78.47.229.109`. - pgwatch-prometheus duplicate-series rejection: a Prometheus client library hard error, not a config tuning issue. ## Acceptance 1. Two concurrent lock-wait events between different session pids no longer produce duplicate Prometheus series. 2. The `pgwatch_lock_waits_blocked_pid` / `pgwatch_lock_waits_blocker_pid` PromQL/Grafana behaviour preserved (or made better) — pids queryable via `pgwatch_lock_waits_blocked_ms{tag_blocked_pid="…"}` instead of needing the value-side `_blocked_pid` series. 3. Repro from above runs for 5+ min with the patched metric and no `collected metric … was collected before` errors in `docker logs pgwatch-prometheus`.

issue