Draft: Make forking fast for public projects
Previously when a public project were forked, it would do a full git clone from the source repository. After that completed, then it would
link the parent's object pool as a Git alternate:
graph LR
A[ForksController#create] --> B(ForkService#execute)
B --> C(Projects::CreateService#execute)
C --> |ProjectImportState#run_after_commit| D(RepositoryForkWorker)
D(RepositoryForkWorker) --> |CreateFork RPC| E(Project#after_import)
E(Project#after_import) --> F(Project#join_pool_repository)
F(Project#join_pool_repository) --> G(ObjectPool::JoinWorker)
G(ObjectPool::JoinWorker) --> |LinkRepositoryToObjectPool RPC| Done
However, that is inefficient because the full repository still has to be
cloned. To speed this up, we can provide an object pool to the
CreateFork RPC, and that will allow a shallow clone for projects on
the same shard as the parent. Now the fork flow looks similar except the CreateFork has an extra object_pool parameter:
graph LR
A[ForksController#create] --> B(ForkService#execute)
B --> C(Projects::CreateService#execute)
C --> |ProjectImportState#run_after_commit| D(RepositoryForkWorker)
D(RepositoryForkWorker) --> |CreateFork RPC with pool| E(Project#after_import)
E(Project#after_import) --> F(Project#join_pool_repository)
F(Project#join_pool_repository) --> G(ObjectPool::JoinWorker)
G(ObjectPool::JoinWorker) --> |LinkRepositoryToObjectPool RPC| Done
Calling JoinWorker isn't necessary since the CreateFork has already linked the pool repository, but it can't hurt to do it again.
Performance
Fork tests were performed with a local copy of gitlab-org/gitlab.
- Before:
RepositoryForkWorkercompleted in 97 s. - After:
RepositoryForkWorkercompleted in 4.7 s
Relates to #24523 (closed)
Edited by Stan Hu