Reduce memory allocations in diff parser
I was looking at the continuous profile for Gitaly, in part because of https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4830, and I noticed this profile:
That looks like we allocate a lot of memory for diff parsing. I added a benchmark which gives the following baseline. Note that the benchmark allocates 600MB across 1.3M allocations.
% go test -bench=. -benchmem -memprofile mem.out
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkParser/parse-8 4 254846574 ns/op 608224402 B/op 1338066 allocs/op
PASS
ok gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff 2.267s
With the allocation-preventing changes in this MR, those numbers drop to 10.5MB across 178K allocations.
% go test -bench=. -benchmem -memprofile mem.out
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkParser/parse-8 7 146350076 ns/op 10457980 B/op 178041 allocs/op
PASS
ok gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff 1.380s
Reusing memory has a higher chance of bugs so this MR only targets the function that the profile highlighted: consumeChunkLine
. The RPC that uses this parser loops over the diffs one at a time and sends them off as gRPC messages, meaning the bytes in parser.Diff()
get copied after each call to parser.Parse()
.