Skip to content

Implement repository caching in GitLab pre-clone step

In https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8407, we see that the gitlab-org/gitlab repository is causing high CPU load on file-02 due to CI clones. Each commit can launch hundreds of builds.

The runner does pre-cache the git directories if the machine is re-used, but this doesn't work if shared runners are used.

@ayufan mentioned we could enable all runners to execute some predefined env variable via pre_clone_script (https://docs.gitlab.com/runner/configuration/advanced-configuration.html). For example:

pre_clone_script = "eval \"$CI_PRE_CLONE_SCRIPT\""

pre_clone is injected before the git init.

Then we can do something like:

  1. Run a scheduled pipeline or a build in prepare phase to upload a .bundle (or tar.gz) to object storage. Using a tarball is significantly faster.
  2. Set CI_PRE_CLONE_SCRIPT to download this bundle if available and extract it to the directory.

This assumes that having even a slightly old copy of the Git repository is better than cloning anew because there are fewer objects for the server to send and compress.

Obviously having this caching inside Gitaly would preferable, but this would at least be a short-term solution to alleviate file server load on file-02 and to see how effective this might be.

Chef changes

  1. Staging: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/2310/diffs
  2. Prod: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/2312/diffs

Pre-clone script

Define a CI/CD variable CI_PRE_CLONE_SCRIPT (can't be defined in repo because we don't have a repo yet!):

echo "Downloading archived master..."
wget -O /tmp/gitlab.tar.gz https://storage.googleapis.com/gitlab-ci-bundle-cache/project-278964/gitlab-master.tar.gz

if [ ! -f /tmp/gitlab.tar.gz ]; then
    echo "Repository cache not available, cloning a new directory..."
    exit
fi

rm -rf $CI_PROJECT_DIR
echo "Extracting tarball into $CI_PROJECT_DIR..."
mkdir -p $CI_PROJECT_DIR
cd $CI_PROJECT_DIR
tar xzf /tmp/gitlab.tar.gz 

Bundle update script

We'd need to periodically update this bundle via something like:

git clone -b master https://gitlab.com/gitlab-org/gitlab.git /tmp/gitlab
cd /tmp/gitlab
tar cvf /tmp/gitlab-master.tar .
gzip /tmp/gitlab-master.tar

/cc: @rymai, @jramsay

Edited by Stan Hu