Lack of buffering makes Git fetch ref advertisements CPU-expensive
I was reviewing the impact of &400 (closed) and I noticed how towards the end of that project, when we removed the
--no-tags setting from gitlab-org/gitlab production#3842 (closed) and gitlab-com/www-gitlab-com production#3863 (closed), there was a big jump in the number of gRPC messages sent on both Gitaly servers:
Because the impact of removing
--no-tags is having more ref advertisements on the wire, this made me curious about the impact of that ref advertisement traffic. This is also because I occasionally (but not constantly) see Gitaly server CPU flame graphs where a lot of time is spent iterating and sending refs, more than sending packfile data.
I also knew from previous investigations that Git sends ref advertisements in a very inefficient way: with 1 or 2
write(2) syscalls per ref. Strace example:
write(1, "288336f9040fa1f641797d8b1e2d155a"..., 58) = 58 write(1, "0041", 4) = 4 write(1, "c443dc7674bfd1316b141d509fce316a"..., 61) = 61 write(1, "003e", 4) = 4 write(1, "8209d163907af6774ae35dc7cb0d913e"..., 58) = 58 write(1, "0041", 4) = 4
So I decided to see if I could patch Git to buffer these ref line writes. gitlab-org/gitlab-git!8 (closed)
The impact seems promising. I set up a simple test where I run
git ls-remote in a loop against my development GitLab instance, on a repo with 11K advertised refs.
Flame graph before, 2000 combined CPU samples for Git/Gitaly/Workhorse/Praefect
Flame graph after, 630 combined CPU samples: