Skip to content

Reduce memory allocations in diff parser

Jacob Vosmaer requested to merge jv-diff-parser-memory into master

I was looking at the continuous profile for Gitaly, in part because of https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4830, and I noticed this profile:

Screenshot_2021-06-09_at_13.52.01

That looks like we allocate a lot of memory for diff parsing. I added a benchmark which gives the following baseline. Note that the benchmark allocates 600MB across 1.3M allocations.

% go test -bench=. -benchmem -memprofile mem.out
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkParser/parse-8         	       4	 254846574 ns/op	608224402 B/op	 1338066 allocs/op
PASS
ok  	gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff	2.267s

With the allocation-preventing changes in this MR, those numbers drop to 10.5MB across 178K allocations.

% go test -bench=. -benchmem -memprofile mem.out 
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkParser/parse-8         	       7	 146350076 ns/op	10457980 B/op	  178041 allocs/op
PASS
ok  	gitlab.com/gitlab-org/gitaly/v14/internal/gitaly/diff	1.380s

Reusing memory has a higher chance of bugs so this MR only targets the function that the profile highlighted: consumeChunkLine. The RPC that uses this parser loops over the diffs one at a time and sends them off as gRPC messages, meaning the bytes in parser.Diff() get copied after each call to parser.Parse().

Edited by Jacob Vosmaer

Merge request reports