Praefect: "cannot execute SELECT FOR UPDATE in a read-only transaction"
hi,
we have Gitlab installed in a K8s cluster and use a postgres cluster with patroni. When patroni switches the postgres primary node, the gitlab webservice works without a problem. but praefect not. in such cases we always have the following:
│ praefect time="2021-04-21T06:00:49Z" level=error msg="failed to dequeue replication events" component=replication_manager error="query: pq: cannot execute SELECT FOR UPDATE in a read-only transaction" pid=39 replication_job_target=gitlab-gitaly-default-1 vir │
│ praefect time="2021-04-21T06:00:50Z" level=error msg="unable to begin a database transaction" error="pq: cannot set transaction read-write mode during recovery" pid=39 praefectName="gitlab-praefect-1:0.0.0.0:8075" virtual_storage=default
from this point on, you have to restart the praefect-pods and now praefect makes a reconnect and everything works again.
it would be great for praefect to detect database problems by itself. i don't know why this happens. as i wrote: the gitlab-webservice does not have such problems, it is only praefect.
we also see that in such cases the health-checks of gitlab reports success, so our monitoring indicates "green" although the system is not in a healthy state.