Create forks of public projects using Git object pool
Problem to solve
Forking is slow, and generates needless CPU and memory usage, and disc IO because forks currently perform a clone, and then deduplicate using the pool. This also means forking large projects is slow from a users perpective.
Further details
With the implementation of object deduplication forks of public projects can be forked directly into a deduplicated state, requiring very minimal resource usage.
Proposal
This would happen during or right after the creation of a forked project:
- SQL: create project linked to pool. Project is in "being cloned" state
- Gitaly: PoolService::PrepareCloneInPool
- Gitaly: PoolService::LinkRepositoryToPool
- SQL: clear project "being cloned" state
The behavior is gated on the following conditions:
-
Is the parent project in a pool?
-
Is the new project public?Not applicable. Pool is created from the public upstream project only. We don't pool objects from downstream projects.
-
Is the object deduplication feature enabled?Not applicable. Object deduplication is enabled by default (subject to hashed storage being enabled)
-
Is the parent project using hashed storage?
-
Is the new project using hashed storage?
Additionally, the Sidekiq queues for object pools have the priority of 1, if/when we're doing fast forking using object pools, these should be bumped up much higher.
Links / references
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.