DR Site Patroni won't stay replicated
The runbook to resynchronize the DR site database from the master using WAL replication works. It takes a day or so, but it ends up replicated. https://gitlab.com/gitlab-com/runbooks/blob/master/howto/geo-patroni-cluster.md
However it has never stayed synchronized. We have to manually resync it every week or two. This will not work when we go live. It needs to stay replicated on its own.
This time, it looks like something going on with Patroni:
+---------------+---------------------------------------+--------------+------+----------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------------+---------------------------------------+--------------+------+----------+----+-----------+
| pg-ha-cluster | patroni-01-db-dr.c.gitlab-dr.internal | 10.251.9.101 | | starting | | unknown |
| pg-ha-cluster | patroni-02-db-dr.c.gitlab-dr.internal | 10.251.9.102 | | starting | | unknown |
| pg-ha-cluster | patroni-03-db-dr.c.gitlab-dr.internal | 10.251.9.103 | | starting | | unknown |
+---------------+---------------------------------------+--------------+------+----------+----+-----------+
@Finotto @dawsmith @abrandl - can we get Ongres or someone more familiar with our database set up to take a look at this and make some recommendations?