Skip to content

Document partial clone workflow for monolithic workflows

Problem

Working on a project in a very large Git repository (e.g. 100GB) is very difficult because the repository needs to be cloned, and because of the huge number (possibly millions) of files in the working copy.

Partial clone and spares checkout solve this problem, but are currently very basic and hard to use. We should documentation how to use them in their alpha state.

Proposal

Add documentation of current workflow for using partial clone and spare checkout

Documentation

Local server

Using Git v2.21, and a repository already cloned locally.

If you use gitlab-ce the partial clone may take 10 minutes or longer, because enumerating objects is slow.

# Enable `uploadpack.allowFilter`
git config --local uploadpack.allowFilter true

# Enable `uploadpack.allowAnySHA1InWant` 
git config --local uploadpack.allowAnySHA1InWant true

# Create a file describing what to clone. The same config will be used for sparse checkout.
# 
# WARNING: We are putting this file on the server. When we clone we will tell Git the
# absolute path to this file on the server!
#
echo "doc/" >> $HOME/partial_clone

# Optional: take note of the size of the original project
du -sh <path>

Partial clone

# Perform a partial clone, but do not checkout
#
#   `--no-checkout` option is required to prevent Git lazy loading data
#   `--filter=sparse:path=<path>` is an absolute path to the file on the server
#   `file://` is needed because we are working on the local filesystem
git clone --no-checkout \
  --filter=sparse:path="$HOME/partial_clone" \
  "file://$HOME/gitlab-ce" sparse-ce`

# Observe there are missing objects
git rev-list --all --quiet --objects --missing=print

# Observe that the repository is smaller than the original
#
#   gitlab-ce: 3,500M
#   sparse-ce:   390M
#
du -sh <path>

Sparse checkout

# Configure sparse checkout
git config --local core.sparsecheckout true
echo "docs/" >> .git/info/sparse-checkout

# Checkout master
git checkout master

Links / references

Edited by James Ramsay (ex-GitLab)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information