CI runner should avoid listing remote refs when it can
Update 2020-01-19
Upstreaming Git patches has its own issue now: #806 (closed)
We will wrap up this issue by doing the following:
-
remove scalability_ci_fetch_shafeature flag gitlab-org/gitlab!51978 (merged) -
update CI documentation with reference to our experiences in this issue gitlab-org/gitlab!52225 (merged) -
production change request to set --no-tagsfor gitlab-com/www-gitlab-com production#3369
Update 2020-01-18
We now believe we have found two server side performance issues in git fetch, that get worse when there are many refs in the repository. Combined with the large number of CI jobs that fetch gitlab-org/gitlab this adds up to a lot of wasted CPU on file-cny-01. If we can fix these performance issues we probably no longer need to change CI runner behavior.
We are working on Git patches that should solve these performance problems. #746 (comment 487602722)
Original issue
When our CI runner runs git fetch, git fetch makes 3 HTTP requests to GitLab. If we change that git fetch command we can have the same result with 2 HTTP requests.
Details
We know that git fetch traffic from GitLab CI can cause considerable CPU pressure on Gitaly servers. This traffic has two components: refs and objects. I think that at least some of the time, we can avoid the refs traffic.
Here is a recent perf CPU flamegraph from file-01-cny, the Gitaly server that hosts gitlab-org/gitlab. At the time this profile was recorded the server was at 100% CPU utilization.
Notice how 18% of CPU time is spent in the function ls_refs under cmd_upload_pack. This C function in Git corresponds to a request sent by Git clients. I believe that if we change the specific git fetch command issued by the CI runner, we can prevent the corresponding ls_refs call on the server, saving up to 18% of CPU.
We would change this command:
git fetch origin refs/pipelines/XXX:refs/pipelines/XXX
To this command:
git fetch -n origin $COMMIT_SHA:refs/pipelines/XXX
Video demo: https://youtu.be/P6iU6uVSEvo
