Feature: add blobless clone option
Problem
GDK clones of gitlab-org/gitlab are slow
Solution
Idea: allow passing an option to gdk so that it does a blobless clone.
https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---filterltfilter-specgt
A blobless clone has the benefits of a shallow clone without the downsides. A more detailed description is in this blog post.
More detail
GDK already provides the option to install with shallow clone.
The benefit of this approach is that it is faster than a regular clone. A GitLab contributor recently reported in Slack that cloning gitlab-org/gitlab was taking over an hour so speedier clones are helpful for developer efficiency.
Shallow clones do have downsides, though:
- Since the commit history is truncated, commands such as git merge-base or git log show different results than they would in a full clone
- A git fetch operation in a shallow clone might end up downloading an almost-full commit history
As the GitHub blog says:
For these reasons we do not recommend shallow clones except for builds that delete the repository immediately afterwards. Fetching from shallow clones can cause more harm than good
Blobless clones, on the other hand, do not have these problems:
when running git fetch in a blobless clone, the server only sends the new commits and trees. The new blobs are downloaded only after a git checkout
blobless clones can perform commands like git merge-base, git log, or even git log -- with the same performance as a full clone.
Commands like git diff or git blame require the contents of the paths to compute diffs, so these will trigger blob downloads the first time they are run. However, the good news is that after that you will have those blobs in your repository and do not need to download them a second time. Most developers only need to run git blame on a small number of files, so this tradeoff of a slightly slower git blame command is worth the faster clone and fetch times.