git for-each-ref --format= slower than piping to git cat-file --batch
In attempting to improve performance of FindLocalBranches, I refactored the code to try to use git-for-each-ref --format to output all the commit fields instead of using a separate git cat-file process to get the commit fields.
However, I found that the performance actually gets worse.
Benchmark Results
With Sorting:
goos: darwin
goarch: arm64
pkg: gitlab.com/gitlab-org/gitaly/v16/internal/gitaly/service/ref
cpu: Apple M3 Max
BenchmarkFindLocalBranches/branches_1000/ref_iterator=false/legacy-16 8 137495391 ns/op 9497109 B/op 92860 allocs/op
BenchmarkFindLocalBranches/branches_1000/ref_iterator=true/iterator-16 10 105919258 ns/op 4026692 B/op 28502 allocs/op
BenchmarkFindLocalBranches/branches_5000/ref_iterator=false/legacy-16 2 517574979 ns/op 46348376 B/op 457235 allocs/op
BenchmarkFindLocalBranches/branches_5000/ref_iterator=true/iterator-16 1 1054941333 ns/op 23088096 B/op 134210 allocs/op
BenchmarkFindLocalBranches/branches_10000/ref_iterator=false/legacy-16 1 2206662625 ns/op 92727552 B/op 913676 allocs/op
BenchmarkFindLocalBranches/branches_10000/ref_iterator=true/iterator-16 1 3997423166 ns/op 43256616 B/op 264640 allocs/op
Without Sorting:
goos: darwin
goarch: arm64
pkg: gitlab.com/gitlab-org/gitaly/v16/internal/gitaly/service/ref
cpu: Apple M3 Max
BenchmarkFindLocalBranches/branches_1000/ref_iterator=false/legacy-16 304 115617168 ns/op 9492061 B/op 92861 allocs/op
BenchmarkFindLocalBranches/branches_1000/ref_iterator=true/iterator-16 330 110676309 ns/op 4165948 B/op 28497 allocs/op
BenchmarkFindLocalBranches/branches_5000/ref_iterator=false/legacy-16 56 640569053 ns/op 46302345 B/op 457195 allocs/op
BenchmarkFindLocalBranches/branches_5000/ref_iterator=true/iterator-16 33 1040771505 ns/op 19344760 B/op 132869 allocs/op
BenchmarkFindLocalBranches/branches_10000/ref_iterator=false/legacy-16 33 1049685692 ns/op 92364733 B/op 912573 allocs/op
Most of the time was spent in syscalls.
When benchmarking the git commands directly, this also bears out. This is against the linux repository
> hyperfine \
--warmup 5 \
--runs 100 \
'git for-each-ref --sort=-committerdate --format="%(refname)" | git cat-file --batch' 'git for-each-ref --sort=-committerdate --format="%(refname)" | git for-each-ref --stdin --format="%(refname),%(objectname),%(authorname),%(subject),%(authoremail),%(authordate:unix),%(authordate:format:%z),%(committername),%(contents),%(committeremail),%(committerdate:unix),%(committerdate:format:%z),%(contents:signature),%(tree),%(parent)"'
Benchmark 1: git for-each-ref --sort=-committerdate --format="%(refname)" | git cat-file --batch
Time (mean ± σ): 36.5 ms ± 1.9 ms [User: 18.4 ms, System: 15.6 ms]
Range (min … max): 32.7 ms … 42.7 ms 100 runs
Benchmark 2: git for-each-ref --sort=-committerdate --format="%(refname)" | git for-each-ref --stdin --format="%(refname),%(objectname),%(authorname),%(subject),%(authoremail),%(authordate:unix),%(authordate:format:%z),%(committername),%(contents),%(committeremail),%(committerdate:unix),%(committerdate:format:%z),%(contents:signature),%(tree),%(parent)"
Time (mean ± σ): 43.8 ms ± 2.4 ms [User: 27.1 ms, System: 11.1 ms]
Range (min … max): 33.9 ms … 51.5 ms 100 runs
Summary
git for-each-ref --sort=-committerdate --format="%(refname)" | git cat-file --batch ran
1.20 ± 0.09 times faster than git for-each-ref --sort=-committerdate --format="%(refname)" | git for-each-ref --stdin --format="%(refname),%(objectname),%(authorname),%(subject),%(authoremail),%(authordate:unix),%(authordate:format:%z),%(committername),%(contents),%(committeremail),%(committerdate:unix),%(committerdate:format:%z),%(contents:signature),%(tree),%(parent)"