Skip to content

praefect/router: Fix creation of repos with additional referenced repo

In order for Praefect to figure out where a specific request needs to be routed to we have introduced the target_repository gRPC extension. If a field is tagged with this extension, then it indicates to Praefect that it shall contain the repository to which the request needs to be routed. Furthermore, it also acts as a mechanism to support our repo rewriting mechanism where Praefect intercepts and rewrites the target repository so that the target storages may have a unique replica path.

The additional_repository extension is in the same spirit. But as requests can only be routed to a single node at once, its only intent is to indicate to Praefect that the contained repository path shall be rewritten to the replica path. Due to a set of limitations in Praefect, the target repository and additional repository must always be on the same storage as the translated replica paths are indeed storage-local. So if there were different target storages, then the target node would likely not even be able to handle them.

But while this restriction exists, we don't actually honor it ourselves when creating repositories via a request that has an additional repo set. Theoretically-speaking, it is thus possible that the new repository will be created on a set of storages that is partially or completely disjunct from the set of storages that host the preexisting additional repository.

There is only one such request now that is impacted by this, which is the CreateObjectPool RPC. So if the above bug is hit, then creation of the object pool will fail because Gitaly cannot locate the origin repo on its storage.

While theoretically a large issue, in practice it is likely not going to matter all that much because most setups use a replication factor that is in fact equal to the number of Gitaly nodes. Which means that we'd always create the object pool on the same set of nodes that already have the origin repository anyway.

But there's one more subtle issues here: we also didn't pay attention to the repository generation. So we'd be totally fine to create an object pool from a source repository that is known to be out-of-date on any of the secondaries. The end result could be inconsistently created oject pools.

Fix these bugs by implementing special routing for all repository creating RPCs that have an additional repository field. Like this, we will always use the same set of storages as the additional repository is assigned to. Furthermore, this also allows us to take generation numbers of the additional repository into account.

Fixes #5094 (closed).

Merge request reports