Skip to content

Draft: Make forking fast for public projects

Stan Hu requested to merge sh-support-fast-fork into master

Previously when a public project were forked, it would do a full git clone from the source repository. After that completed, then it would link the parent's object pool as a Git alternate:

graph LR
    A[ForksController#create] --> B(ForkService#execute)
    B --> C(Projects::CreateService#execute)
    C --> |ProjectImportState#run_after_commit| D(RepositoryForkWorker)
    D(RepositoryForkWorker) --> |CreateFork RPC| E(Project#after_import)
    E(Project#after_import) --> F(Project#join_pool_repository)
    F(Project#join_pool_repository) --> G(ObjectPool::JoinWorker)
    G(ObjectPool::JoinWorker) --> |LinkRepositoryToObjectPool RPC| Done

However, that is inefficient because the full repository still has to be cloned. To speed this up, we can provide an object pool to the CreateFork RPC, and that will allow a shallow clone for projects on the same shard as the parent. Now the fork flow looks similar except the CreateFork has an extra object_pool parameter:

graph LR
    A[ForksController#create] --> B(ForkService#execute)
    B --> C(Projects::CreateService#execute)
    C --> |ProjectImportState#run_after_commit| D(RepositoryForkWorker)
    D(RepositoryForkWorker) --> |CreateFork RPC with pool| E(Project#after_import)
    E(Project#after_import) --> F(Project#join_pool_repository)
    F(Project#join_pool_repository) --> G(ObjectPool::JoinWorker)
    G(ObjectPool::JoinWorker) --> |LinkRepositoryToObjectPool RPC| Done

Calling JoinWorker isn't necessary since the CreateFork has already linked the pool repository, but it can't hurt to do it again.

Performance

Fork tests were performed with a local copy of gitlab-org/gitlab.

  • Before: RepositoryForkWorker completed in 97 s.
  • After: RepositoryForkWorker completed in 4.7 s

Relates to #24523

Edited by Stan Hu

Merge request reports