Postgres monitoring: ensure that lag is monitored for all slots (both physical and logical)
In production#7969 (closed), we conducted an experiment in gstg, to check the conversion of a physical standby node to logical. After a few days of logical standby being up and running, the logical slot accumulated almost 50 GiB of lag – as expected, an incoming DDL blocked the replication.
It looks like such lags are invisible to current monitoring and alerts, in https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1&from=1668384000000&to=1668729599000:
-- there are signs of the problem (such as growing disk space consumption) but the lag graphs don't show it.
We need to double-check how the lags are monitored (the metric that is monitored on the primary/publisher, based on pg_replication_slots and measured in bytes) – and fix if needed.

