Create forks of public projects using Git object pool
Problem to solve
Forking is slow, and generates needless CPU and memory usage, and disc IO because forks currently perform a clone, and then deduplicate using the pool. This also means forking large projects is slow from a users perpective.
Further details
With the implementation of object deduplication forks of public projects can be forked directly into a deduplicated state, requiring very minimal resource usage.
Proposal
This would happen during or right after the creation of a forked project:
- SQL: create project linked to pool. Project is in "being cloned" state
- Gitaly: PoolService::PrepareCloneInPool
- Gitaly: PoolService::LinkRepositoryToPool
- SQL: clear project "being cloned" state
The behavior is gated on the following conditions:
-
Is the parent project in a pool?
-
Is the new project public?Not applicable. Pool is created from the public upstream project only. We don't pool objects from downstream projects.
-
Is the object deduplication feature enabled?Not applicable. Object deduplication is enabled by default (subject to hashed storage being enabled)
-
Is the parent project using hashed storage?
-
Is the new project using hashed storage?
Additionally, the Sidekiq queues for object pools have the priority of 1, if/when we're doing fast forking using object pools, these should be bumped up much higher.