Document partial clone workflow for monolithic workflows
Problem
Working on a project in a very large Git repository (e.g. 100GB) is very difficult because the repository needs to be cloned, and because of the huge number (possibly millions) of files in the working copy.
Partial clone and spares checkout solve this problem, but are currently very basic and hard to use. We should documentation how to use them in their alpha state.
Proposal
Add documentation of current workflow for using partial clone and spare checkout
Documentation
Local server
Using Git v2.21, and a repository already cloned locally.
If you use gitlab-ce
the partial clone may take 10 minutes or longer, because enumerating objects is slow.
# Enable `uploadpack.allowFilter`
git config --local uploadpack.allowFilter true
# Enable `uploadpack.allowAnySHA1InWant`
git config --local uploadpack.allowAnySHA1InWant true
# Create a file describing what to clone. The same config will be used for sparse checkout.
#
# WARNING: We are putting this file on the server. When we clone we will tell Git the
# absolute path to this file on the server!
#
echo "doc/" >> $HOME/partial_clone
# Optional: take note of the size of the original project
du -sh <path>
Partial clone
# Perform a partial clone, but do not checkout
#
# `--no-checkout` option is required to prevent Git lazy loading data
# `--filter=sparse:path=<path>` is an absolute path to the file on the server
# `file://` is needed because we are working on the local filesystem
git clone --no-checkout \
--filter=sparse:path="$HOME/partial_clone" \
"file://$HOME/gitlab-ce" sparse-ce`
# Observe there are missing objects
git rev-list --all --quiet --objects --missing=print
# Observe that the repository is smaller than the original
#
# gitlab-ce: 3,500M
# sparse-ce: 390M
#
du -sh <path>
Sparse checkout
# Configure sparse checkout
git config --local core.sparsecheckout true
echo "docs/" >> .git/info/sparse-checkout
# Checkout master
git checkout master
Links / references
Edited by James Ramsay (ex-GitLab)