Skip to content

Set replication factor for a repository

Sami Hiltunen requested to merge smh-set-replication-factor into master

This MR implements functionality to set a replication factor for a repository (&3372 (closed)).

The administrator sets a desired replication factor for a repository. Replication factor has to be at minimum one, which means the repository is only present on the primary node. Maximum replication factor is limited by the number of storage nodes within a virtual storage as replicating to more nodes would not be possible.

Praefect picks random storages and assigns them to host the repository until the desired replication factor is met. Random selection should overall balance the repositories between the nodes. Manual assignments are not supported. Ideally we'll improve our balancing logic in the future from random to smart to avoid needing manual assignments.

Variable replication factor is implemented in a backwards compatible manner. Any repositories which do not have assignments set are replicated to every storage node in the virtual storage. As such, it is possible to opt repositories in to variable replication factor one by one. By default, created repositories do not have a replication factor set and are replicated to every node in the virtual storage. A follow up MR will add support for setting a default replication factor new repositories.

To avoid problems stemming from differing configurations between Praefect nodes or storage nodes removed from the virtual storage, Praefect only considers the configured storage nodes when creating or removing assignments. As such, the replication factor may end up being higher than intended but never lower.

Praefect considers assignments only when using the repository specific primary stack. When using other election strategies, the Coordinator will still replicate the repository on every node. This is likely not going to be addressed as the goal is to remove the other election strategies.

Related to #2971 (closed)

Merge request reports