Resolve "Implement repository caching in www-gitlab-com pre-clone step"
NOTE: This change was reverted, it was not performant enough. We will revisit the docker approach. See details in this comment: !40665 (comment 287868133)
DESCRIPTION
For each job in the www-gitlab-com
CI/CD build, the git repo clone currently takes between a minute and a half and two minutes, because it is very big and takes a lot of network time.
If we cache the git clone of master as a tarball to object storage on a regular basis, and only fetch new commits for the current pipeline branch since the last image was built, this time could be greatly reduced.
RELATED ISSUES
This approach has already been successfully implemented for the gitlab-org/gitlab
repo, so all we should need to do is replicate it for this gitlab-com/www-gitlab-com
repo.
We previously were going to accomplish this via building a Docker image, but it makes more sense to use the already-proven approach, which is simpler than the Docker approach anyway.
TASKS
-
Add the schedule to repo -
Create the bucket (or reuse existing one) -
Add CI variable with credential to bucket -
Set up CI_PRE_CLONE_SCRIPT
variable - see documentation here and required contents below (note this is not yet merged or available on live docs site) -
Add the sync stage and job
CI_PRE_CLONE_SCRIPT
variable contents:
echo "Downloading archived master..."
wget -O /tmp/www-gitlab-com-master.tar.gz https://storage.googleapis.com/gitlab-ci-git-repo-cache/project-278964/www-gitlab-com-master.tar.gz
if [ ! -f /tmp/www-gitlab-com-master.tar.gz ]; then
echo "Repository cache not available, cloning a new directory..."
exit
fi
rm -rf $CI_PROJECT_DIR
echo "Extracting tarball into $CI_PROJECT_DIR..."
mkdir -p $CI_PROJECT_DIR
cd $CI_PROJECT_DIR
tar xzf /tmp/www-gitlab-com-master.tar.gz
rm -f /tmp/www-gitlab-com-master.tar.gz
chmod a+w $CI_PROJECT_DIR
Closes #6511 (closed)