Support partial clone and sparse-checkout pattern in each job
## Description For a multi-tenants monorepo, not all files are needed to perform a CI job. In fact, to test a component, only a subset of the files and dir paths available in the repository are required. For this reason, git has introduced [Git Sparse Checkout](https://git-scm.com/docs/git-sparse-checkout) which can be used to limit the number of files checked out in a working copy. This feature, when used in combination with [Git Partial Clone](https://about.gitlab.com/blog/2020/03/13/partial-clone-for-massive-repositories/), makes downloading a big monorepo extremely light-weighted and fast. ## Proposal There are 2 proposals which entail: 1. Provide options for jobs to specify a fetch filter to enable Partial Clone. These options should be limited to `tree:0` or `blob:none` for performance reason. 2. Provide a keyword in Gitlab CI Yaml spec for jobs to be able to specify which directory to checkout. ``` jobA: sparse-checkout: cone-mode: enable spec: - dirA - dirC/fileD - dirG ``` This should be translate to ``` git sparse-checkout init --cone echo "dirA dirC/fileD dirG" | git sparse-checkout set --stdin ``` prior to actual running `git checkout --force <rev>` before running user script --- Sample code of the 2 proposal when used together ``` ~/test> mkdir gitlab ~/test> cd gitlab ~/test/gitlab> git init Initialized empty Git repository in /Users/sluongngoc/test/gitlab/.git/ master ~/test/gitlab> git remote add origin git@gitlab.com:gitlab-org/gitlab.git master ~/test/gitlab> git sparse-checkout init --cone master ~/test/gitlab> echo "danger \ scripts" | git sparse-checkout set --stdin master ~/test/gitlab> git fetch --filter=tree:0 --no-tags --prune origin master remote: Enumerating objects: 7054, done. remote: Counting objects: 100% (7054/7054), done. remote: Compressing objects: 100% (6784/6784), done. remote: Total 185558 (delta 412), reused 6255 (delta 270), pack-reused 178504 Receiving objects: 100% (185558/185558), 53.52 MiB | 5.89 MiB/s, done. Resolving deltas: 100% (7875/7875), done. From gitlab.com:gitlab-org/gitlab * branch master -> FETCH_HEAD * [new branch] master -> origin/master Expanding reachable commits in commit graph: 185558, done. master ~/test/gitlab> git checkout master remote: Enumerating objects: 3640, done. remote: Counting objects: 100% (3640/3640), done. remote: Compressing objects: 100% (3125/3125), done. remote: Total 6875 (delta 7), reused 2305 (delta 5), pack-reused 3235 Receiving objects: 100% (6875/6875), 1.45 MiB | 2.89 MiB/s, done. Resolving deltas: 100% (9/9), done. remote: Enumerating objects: 84, done. remote: Counting objects: 100% (84/84), done. remote: Compressing objects: 100% (79/79), done. remote: Total 139 (delta 1), reused 19 (delta 1), pack-reused 55 Receiving objects: 100% (139/139), 708.57 KiB | 1.79 MiB/s, done. Resolving deltas: 100% (3/3), done. Updating files: 100% (139/139), done. Branch 'master' set up to track remote branch 'master' from 'origin'. Already on 'master' master ~/test/gitlab> find . -type d -depth 1 ./scripts ./.git ./danger master ~/test/gitlab> du -sh ../gitlab 77M ../gitlab master ~/test/gitlab> du -sh ~/work/gitlab/gitlab 1.1G /Users/sluongngoc/work/gitlab/gitlab ``` ## Links to related issues and merge requests / references Discussed in https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2283#note_385072650
issue