Draft: Make forking fast for public projects
Previously when a public project were forked, it would do a full git clone
from the source repository. After that completed, then it would
link the parent's object pool as a Git alternate:
graph LR
A[ForksController#create] --> B(ForkService#execute)
B --> C(Projects::CreateService#execute)
C --> |ProjectImportState#run_after_commit| D(RepositoryForkWorker)
D(RepositoryForkWorker) --> |CreateFork RPC| E(Project#after_import)
E(Project#after_import) --> F(Project#join_pool_repository)
F(Project#join_pool_repository) --> G(ObjectPool::JoinWorker)
G(ObjectPool::JoinWorker) --> |LinkRepositoryToObjectPool RPC| Done
However, that is inefficient because the full repository still has to be
cloned. To speed this up, we can provide an object pool to the
CreateFork
RPC, and that will allow a shallow clone for projects on
the same shard as the parent. Now the fork flow looks similar except the CreateFork
has an extra object_pool
parameter:
graph LR
A[ForksController#create] --> B(ForkService#execute)
B --> C(Projects::CreateService#execute)
C --> |ProjectImportState#run_after_commit| D(RepositoryForkWorker)
D(RepositoryForkWorker) --> |CreateFork RPC with pool| E(Project#after_import)
E(Project#after_import) --> F(Project#join_pool_repository)
F(Project#join_pool_repository) --> G(ObjectPool::JoinWorker)
G(ObjectPool::JoinWorker) --> |LinkRepositoryToObjectPool RPC| Done
Calling JoinWorker
isn't necessary since the CreateFork
has already linked the pool repository, but it can't hurt to do it again.
Performance
Fork tests were performed with a local copy of gitlab-org/gitlab
.
- Before:
RepositoryForkWorker
completed in 97 s. - After:
RepositoryForkWorker
completed in 4.7 s
Relates to #24523
Edited by Stan Hu