Cache Cloned Repo on Runner
NOTE: See the epic for more context on the effort to reduce repo cloning time.
UPDATE
This third (and final) attempt at a caching-based approach to reducing the repo clone times was also unsuccessful, for the following reasons:
- The repo size (6g) is such that even copying the data on local disk (from the runner mounted cache dir to the docker job instance dir) can take a significant amount of time. The amount of time varies too (presumably based on IO load on the instance). In testing, it took anywhere from 20 seconds to almost 2 minutes.
- The lifetime and reuse schedule of the runners is such that cache misses are frequent, possibly even more frequent than cache hits. This is because the runner pool prefers "newer" runners to older ones, so jobs are more likely to get a runner which does not yet have the persistent repo cache available, and has to pull it down fresh. This time, coupled with the additional time to copy the cache to the instance, results in an overall longer time that just letting each job pull down the repo.
However, one thing that was discovered as part of the work in this MR was that the GIT_STRATEGY is defaulting to clone instead of fetch. Switching to fetch has the potential to reduce the full clone on many job runs. See #7035 (closed) for more details.
DESCRIPTION
For each job in the www-gitlab-com CI/CD build, the git repo clone currently takes between a minute and a half and two minutes, because it is very big and takes a lot of network time.
If we cache the git clone locally on the runners, and then copy and only pull new commits since the last image was built, this time could be greatly reduced.
See more details in this slack thread.
BACKGROUND
This is the third attempt, the two prior attempts were:
However, these were not successful, primarily due to the time it takes to download the actual volume of data of the repo over the network, regardless of how they are packaged.
There is more context of the prior attempts on those issues.
Thus, this approach attempts to avoid that by eliminating the full repo network download from each individual job, and instead only doing it once when the runner is provisioned, or only on the first job which runs on the runner.
IMPLEMENTATION
This implementation will leverage some of the same approaches used in the object-store-based caching approach, specifically the pre-clone-script approach.