Feature: add blobless clone option

Problem

GDK clones of gitlab-org/gitlab are slow

Solution

Idea: allow passing an option to gdk so that it does a blobless clone.

https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---filterltfilter-specgt

A blobless clone has the benefits of a shallow clone without the downsides. A more detailed description is in this blog post.

More detail

GDK already provides the option to install with shallow clone.

The benefit of this approach is that it is faster than a regular clone. A GitLab contributor recently reported in Slack that cloning gitlab-org/gitlab was taking over an hour so speedier clones are helpful for developer efficiency.

Shallow clones do have downsides, though:

  • Since the commit history is truncated, commands such as git merge-base or git log show different results than they would in a full clone
  • A git fetch operation in a shallow clone might end up downloading an almost-full commit history

As the GitHub blog says:

For these reasons we do not recommend shallow clones except for builds that delete the repository immediately afterwards. Fetching from shallow clones can cause more harm than good

Blobless clones, on the other hand, do not have these problems:

when running git fetch in a blobless clone, the server only sends the new commits and trees. The new blobs are downloaded only after a git checkout

blobless clones can perform commands like git merge-base, git log, or even git log -- with the same performance as a full clone.

Commands like git diff or git blame require the contents of the paths to compute diffs, so these will trigger blob downloads the first time they are run. However, the good news is that after that you will have those blobs in your repository and do not need to download them a second time. Most developers only need to run git blame on a small number of files, so this tradeoff of a slightly slower git blame command is worth the faster clone and fetch times.