Geo: Improve replication status
From https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1610#note_38298784, @brodock mentions:
The method you are using to check replication lag is only reliable when there is a replication lag, if the log shipping stops, it will say "0" but in fact it's not replicating anymore it's just that the amount of logs in the secondary node has catched up.
While this is better than nothing, we either explictly says that when it's zero, or we also need to compare with the state on the master as well.
Master:
SELECT pg_current_xlog_location();
Slave:
SELECT pg_last_xlog_receive_location();
I've also found this: https://www.enterprisedb.com/blog/monitoring-approach-streaming-replication-hot-standby-postgresql-93
They mention here a select * from pg_stat_replication; we can run on primary that gives us information on the replication on the secondary nodes.
This is what I see on my test machine:
template1=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
360/B0000060
(1 row)
template1=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | w
rite_location | flush_location | replay_location | sync_priority | sync_state
------+----------+-------------------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+--
--------------+----------------+-----------------+---------------+------------
6193 | 16386 | gitlab_replicator | walreceiver | 10.xxx.xxx.xxx | | 60674 | 2017-08-19 23:49:55.670597+00 | | streaming | 360/B0000060 | 3
60/B0000060 | 360/B0000060 | 360/B0000060 | 0 | async
so by comparing both the 360/B0000060 we know for sure a "zero" lag also means they are not zero because we are failling to ship logs.