Support partial clone and sparse-checkout pattern in each job
Description
For a multi-tenants monorepo, not all files are needed to perform a CI job. In fact, to test a component, only a subset of the files and dir paths available in the repository are required.
For this reason, git has introduced Git Sparse Checkout which can be used to limit the number of files checked out in a working copy.
This feature, when used in combination with Git Partial Clone, makes downloading a big monorepo extremely light-weighted and fast.
Proposal
There are 2 proposals which entail:
-
Provide options for jobs to specify a fetch filter to enable Partial Clone. These options should be limited to
tree:0
orblob:none
for performance reason. -
Provide a keyword in Gitlab CI Yaml spec for jobs to be able to specify which directory to checkout.
jobA:
sparse-checkout:
cone-mode: enable
spec:
- dirA
- dirC/fileD
- dirG
This should be translate to
git sparse-checkout init --cone
echo "dirA
dirC/fileD
dirG" | git sparse-checkout set --stdin
prior to actual running git checkout --force <rev>
before running user script
Sample code of the 2 proposal when used together
~/test> mkdir gitlab
~/test> cd gitlab
~/test/gitlab> git init
Initialized empty Git repository in /Users/sluongngoc/test/gitlab/.git/
master ~/test/gitlab> git remote add origin git@gitlab.com:gitlab-org/gitlab.git
master ~/test/gitlab> git sparse-checkout init --cone
master ~/test/gitlab> echo "danger
\ scripts" | git sparse-checkout set --stdin
master ~/test/gitlab> git fetch --filter=tree:0 --no-tags --prune origin master
remote: Enumerating objects: 7054, done.
remote: Counting objects: 100% (7054/7054), done.
remote: Compressing objects: 100% (6784/6784), done.
remote: Total 185558 (delta 412), reused 6255 (delta 270), pack-reused 178504
Receiving objects: 100% (185558/185558), 53.52 MiB | 5.89 MiB/s, done.
Resolving deltas: 100% (7875/7875), done.
From gitlab.com:gitlab-org/gitlab
* branch master -> FETCH_HEAD
* [new branch] master -> origin/master
Expanding reachable commits in commit graph: 185558, done.
master ~/test/gitlab> git checkout master
remote: Enumerating objects: 3640, done.
remote: Counting objects: 100% (3640/3640), done.
remote: Compressing objects: 100% (3125/3125), done.
remote: Total 6875 (delta 7), reused 2305 (delta 5), pack-reused 3235
Receiving objects: 100% (6875/6875), 1.45 MiB | 2.89 MiB/s, done.
Resolving deltas: 100% (9/9), done.
remote: Enumerating objects: 84, done.
remote: Counting objects: 100% (84/84), done.
remote: Compressing objects: 100% (79/79), done.
remote: Total 139 (delta 1), reused 19 (delta 1), pack-reused 55
Receiving objects: 100% (139/139), 708.57 KiB | 1.79 MiB/s, done.
Resolving deltas: 100% (3/3), done.
Updating files: 100% (139/139), done.
Branch 'master' set up to track remote branch 'master' from 'origin'.
Already on 'master'
master ~/test/gitlab> find . -type d -depth 1
./scripts
./.git
./danger
master ~/test/gitlab> du -sh ../gitlab
77M ../gitlab
master ~/test/gitlab> du -sh ~/work/gitlab/gitlab
1.1G /Users/sluongngoc/work/gitlab/gitlab
Links to related issues and merge requests / references
Discussed in !2283 (comment 385072650)