Observe which projects suffered dataloss from failover
Problem to solve
As an administrator, when a failover occurs, I want to know which repositories have experienced dataloss. This provides me an indication of which developers have been effected and where to target efforts for recovering missing data where possible. For example, letting a team know that they might need to re-push their feature branches.
Further details
At a minimum, we should be able to which specific projects suffered dataloss, caused by replication operations that cannot be completed due to incomplete replication queue.
A possible future improvement could also be to summarize the number of replication jobs lost for each repo, and perhaps even count per ref, per repository.
Proposal
Store failed replication jobs for some period of time in the Praefect database, and provide a method for reporting on this.
Perhaps the MVC, would be command that can be run on any Praefect server to:
- dump all the failed replication jobs (this could be useful for analysis)
- summarize failed replication jobs