Create forks of public projects using Git object pool

Problem to solve

Forking is slow, and generates needless CPU and memory usage, and disc IO because forks currently perform a clone, and then deduplicate using the pool. This also means forking large projects is slow from a users perpective.

Further details

With the implementation of object deduplication forks of public projects can be forked directly into a deduplicated state, requiring very minimal resource usage.

Proposal

This would happen during or right after the creation of a forked project:

  • SQL: create project linked to pool. Project is in "being cloned" state
  • Gitaly: PoolService::PrepareCloneInPool
  • Gitaly: PoolService::LinkRepositoryToPool
  • SQL: clear project "being cloned" state

The behavior is gated on the following conditions:

  1. Is the parent project in a pool?

  2. Is the new project public?

    Not applicable. Pool is created from the public upstream project only. We don't pool objects from downstream projects.

  3. Is the object deduplication feature enabled?

    Not applicable. Object deduplication is enabled by default (subject to hashed storage being enabled)

  4. Is the parent project using hashed storage?

  5. Is the new project using hashed storage?

Additionally, the Sidekiq queues for object pools have the priority of 1, if/when we're doing fast forking using object pools, these should be bumped up much higher.

Links / references

Edited Jul 29, 2019 by James Ramsay (ex-GitLab)
Assignee Loading
Time tracking Loading