Lack of buffering makes Git fetch ref advertisements CPU-expensive
I was reviewing the impact of &400 (closed) and I noticed how towards the end of that project, when we removed the --no-tags
setting from gitlab-org/gitlab production#3842 (closed) and gitlab-com/www-gitlab-com production#3863 (closed), there was a big jump in the number of gRPC messages sent on both Gitaly servers:
Because the impact of removing --no-tags
is having more ref advertisements on the wire, this made me curious about the impact of that ref advertisement traffic. This is also because I occasionally (but not constantly) see Gitaly server CPU flame graphs where a lot of time is spent iterating and sending refs, more than sending packfile data.
I also knew from previous investigations that Git sends ref advertisements in a very inefficient way: with 1 or 2 write(2)
syscalls per ref. Strace example:
write(1, "288336f9040fa1f641797d8b1e2d155a"..., 58) = 58
write(1, "0041", 4) = 4
write(1, "c443dc7674bfd1316b141d509fce316a"..., 61) = 61
write(1, "003e", 4) = 4
write(1, "8209d163907af6774ae35dc7cb0d913e"..., 58) = 58
write(1, "0041", 4) = 4
So I decided to see if I could patch Git to buffer these ref line writes. https://gitlab.com/gitlab-org/gitlab-git/-/merge_requests/8
The impact seems promising. I set up a simple test where I run git ls-remote
in a loop against my development GitLab instance, on a repo with 11K advertised refs.
Flame graph before, 2000 combined CPU samples for Git/Gitaly/Workhorse/Praefect
Flame graph after, 630 combined CPU samples: