[Gitlab runner] How optimised is the pipeline git clone strategy?

Problem to solve

The docs for the pipeline git clone strategy (https://docs.gitlab.com/ee/ci/yaml/README.html#git-strategy) say that clone is the slowest of the options. I want to find out if there are any optimisations under the hood going on here or if it's just a dumb clone?

On our Jenkins checkout we use a couple of strategies to get the equivalent of a clean checkout but with a significant speed boost.

Further details

What we do:

  1. We use a long living bare repository locally on each CI machine
  2. At the start of each clone, we run a fetch on this bare repo to get it up to date.
  3. We do a --reference-if-able clone into the working directory for the CI job, specifying the bare repo as the repo to reference. (We skip the LFS clone at this stage)
  4. We do something similar with git-lfs by using config.storage to keep a long living directory of our lfs blobs around and pointing the working directory git repo at this for the lfs pull

Proposal

If there aren't optimisations like this going on it would be great to add them (and I can detail them more here as I did a fair bit of experimentation to find the best setup) as an option for git-strategy. Maybe something like efficient-clone?

What does success look like, and how can we measure that?

A git strategy that is:

  • Faster than clone ⏩
  • Cleaner than fetch ✨

🎉

Links / references

Assignee Loading
Time tracking Loading