Test Praefect against network partitions
A well known tool for validating data storage systems (PG, Redis, ElasticSearch, etc) is Jespen. It helps understanding failures in distributed systems and how the system deals with those.
The Gitaly team has built Praefect, which has never been properly validated by the QA team. While there's unit and integration tests covering these cases, there's no blackbox testing on a Praefect cluster with multiple Gitaly's.
To be tested (See also the Jespen link):
-
Disk failures -
Bit rot -
Network partitions (short lived, long lived) -
Transactional writes, and how to break them -
Out of order transactional votes -
High package dropping Network -
etc. etc.
Edited by Zeger-Jan van de Weg