Automatically repair repositories
Problem to solve
When a primary goes offline, if there are changes that haven't been replicated, those repositories will be marked as read only #2630 (closed) to prevent data loss. If the node comes back online, GitLab doesn't automatically recover, nor provides admins with an way to recover besides accepting the data loss.
Praefect could failover in error when the primary was only experiencing temporary failures, and put a large number of repositories into read only mode. This would be frustrating for an SRE to address because they'd have rather just too 30 seconds of outage, than have a failover and thousands of projects marked as read only and no obvious way to resolve the situation.
Repository on a secondary node might get out of sync if a replication job fails to be processed. The repository stays outdated until another write comes in triggering a replication job.
When Praefect notices an outdated repository and a more up to date one is present on an available node, Praefect schedules a replication job to bring the outdated repository up to date again.