Skip to content

Improve memory and CPU use of ParseFile

This function created redundant copies of each file in memory, because ReadFull reads into a byte array. When turning these into string via string(), Go creates a full copy in memory.

I rewrote this function to use strings.Builder instead, into which we copy files chunk by chunk, which is more efficient both in terms of memory and CPU use. This is because Builder uses a cast internally to produce the string instead of creating a copy in memory:

// String returns the accumulated string.
func (b *Builder) String() string {
	return *(*string)(unsafe.Pointer(&b.buf))
}

Benchmarks

Memory use is improved by 27%, with 25 more iterations per cycle (i.e. CPU use is vastly improved too), while performing the same number of allocations:

This branch

make bench
go test ./test/benchmark -bench=.
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Benchmark_Stage1_ParseAllFiles-8   	     100	  11113185 ns/op	30047309 B/op	    1343 allocs/op
Benchmark_Stage2_MmapProbeAll-8    	      64	  17150730 ns/op	41824333 B/op	   96658 allocs/op
Benchmark_Stage3_MmapRenderAll-8   	      26	  43812038 ns/op	62997431 B/op	   96700 allocs/op
Benchmark_ParseCounterFile-8       	    3501	    361633 ns/op	 1163813 B/op	      40 allocs/op
Benchmark_ParseHistogramFile-8     	    1495	    702273 ns/op	 3438674 B/op	      44 allocs/op
Benchmark_RenderMmapText-8         	      73	  15879798 ns/op	   61864 B/op	      18 allocs/op
PASS
ok  	gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark	8.920s

main branch

make bench
go test ./test/benchmark -bench=.
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Benchmark_Stage1_ParseAllFiles-8   	      75	  14855719 ns/op	53026864 B/op	    1249 allocs/op
Benchmark_Stage2_MmapProbeAll-8    	      62	  19088435 ns/op	64655000 B/op	   96528 allocs/op
Benchmark_Stage3_MmapRenderAll-8   	      24	  45855832 ns/op	85830765 B/op	   96571 allocs/op
Benchmark_ParseCounterFile-8       	    2793	    358192 ns/op	 2123477 B/op	      38 allocs/op
Benchmark_ParseHistogramFile-8     	     981	   1169113 ns/op	 6078731 B/op	      44 allocs/op
Benchmark_RenderMmapText-8         	      73	  16185477 ns/op	   61947 B/op	      18 allocs/op
PASS
ok  	gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark	8.901s

Load tests

The difference here is not remarkable, but noticeable in the 90th and above with up to 10ms lower latencies:

This branch

echo "GET http://localhost:8082/metrics" | vegeta attack -rate=10 -duration=10s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         100, 10.10, 10.06
Duration      [total, attack, wait]             9.943s, 9.9s, 42.904ms
Latencies     [min, mean, 50, 90, 95, 99, max]  37.964ms, 46.449ms, 44.885ms, 56.264ms, 60.093ms, 68.529ms, 72.151ms
Bytes In      [total, mean]                     463837274, 4638372.74
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:100  
Error Set:

main branch

echo "GET http://localhost:8082/metrics" | vegeta attack -rate=10 -duration=10s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         100, 10.10, 10.05
Duration      [total, attack, wait]             9.947s, 9.9s, 46.921ms
Latencies     [min, mean, 50, 90, 95, 99, max]  40.481ms, 49.552ms, 48.302ms, 56.069ms, 67.618ms, 76.38ms, 77.584ms
Bytes In      [total, mean]                     463839070, 4638390.70
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:100  
Error Set:
Edited by Matthias Käppler

Merge request reports

Loading