Deduplication of private forks
Problem to solve
Object deduplication is especially useful for self-hosted GitLab instances that use a forking workflow to reduce costs, but when used in a self-hosted environment it is not typical for the project to be public. Object deduplication is currently limited to public projects.
Further details
Currently objects are only shared from the upstream project the downstream forks.
graph TD
Pool-->|alternate|Upstream
Upstream-->|gc|Pool
Pool-->|alternate|A[Fork A]
Pool-->|alternate|B[Fork B]
Assuming the upstream is private when the fork is created, the owner of the fork had access to upstream.
In the case of a public upstream project, the only event that would impact the pool, is if the upstream is made internal or private, at which point it should be removed from the pool and a fork is promoted.
However, if the upstream is private or internal, it is more complicated because currently the owner of Fork A could lose access to the Upstream (permissions change, like removed from group), but they still retain access to their copy of the repository.
- Option A: add or remove forks from object pools every time permissions change - this creates a situation where a fork may or may not be part of a pool based on changing permissions
- Option B: require the owner of the fork to always have access to the upstream, else lose access to the whole project – this pushes the permissions problem to the application layer (see gitlab#21881)
We should pursue Option B because it solves not only the deduplication problem, but other problems too!
Proposal
Subject to the implementation of gitlab#21881
- enable the deduplication of forks of private forks