Improve memory and CPU use of ParseFile
This function created redundant copies of each
file in memory, because ReadFull
reads into
a byte array. When turning these into string
via string()
, Go creates a full copy in
memory.
I rewrote this function to use strings.Builder
instead, into which we copy files chunk by chunk, which is more efficient both in terms of memory and CPU use. This is because Builder
uses a cast internally to produce the string instead of creating a copy in memory:
// String returns the accumulated string.
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
Benchmarks
Memory use is improved by 27%, with 25 more iterations per cycle (i.e. CPU use is vastly improved too), while performing the same number of allocations:
This branch
make bench
go test ./test/benchmark -bench=.
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Benchmark_Stage1_ParseAllFiles-8 100 11113185 ns/op 30047309 B/op 1343 allocs/op
Benchmark_Stage2_MmapProbeAll-8 64 17150730 ns/op 41824333 B/op 96658 allocs/op
Benchmark_Stage3_MmapRenderAll-8 26 43812038 ns/op 62997431 B/op 96700 allocs/op
Benchmark_ParseCounterFile-8 3501 361633 ns/op 1163813 B/op 40 allocs/op
Benchmark_ParseHistogramFile-8 1495 702273 ns/op 3438674 B/op 44 allocs/op
Benchmark_RenderMmapText-8 73 15879798 ns/op 61864 B/op 18 allocs/op
PASS
ok gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark 8.920s
main
branch
make bench
go test ./test/benchmark -bench=.
goos: linux
goarch: amd64
pkg: gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Benchmark_Stage1_ParseAllFiles-8 75 14855719 ns/op 53026864 B/op 1249 allocs/op
Benchmark_Stage2_MmapProbeAll-8 62 19088435 ns/op 64655000 B/op 96528 allocs/op
Benchmark_Stage3_MmapRenderAll-8 24 45855832 ns/op 85830765 B/op 96571 allocs/op
Benchmark_ParseCounterFile-8 2793 358192 ns/op 2123477 B/op 38 allocs/op
Benchmark_ParseHistogramFile-8 981 1169113 ns/op 6078731 B/op 44 allocs/op
Benchmark_RenderMmapText-8 73 16185477 ns/op 61947 B/op 18 allocs/op
PASS
ok gitlab.com/gitlab-org/gitlab-metrics-exporter/test/benchmark 8.901s
Load tests
The difference here is not remarkable, but noticeable in the 90th and above with up to 10ms lower latencies:
This branch
echo "GET http://localhost:8082/metrics" | vegeta attack -rate=10 -duration=10s | tee results.bin | vegeta report
Requests [total, rate, throughput] 100, 10.10, 10.06
Duration [total, attack, wait] 9.943s, 9.9s, 42.904ms
Latencies [min, mean, 50, 90, 95, 99, max] 37.964ms, 46.449ms, 44.885ms, 56.264ms, 60.093ms, 68.529ms, 72.151ms
Bytes In [total, mean] 463837274, 4638372.74
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:100
Error Set:
main
branch
echo "GET http://localhost:8082/metrics" | vegeta attack -rate=10 -duration=10s | tee results.bin | vegeta report
Requests [total, rate, throughput] 100, 10.10, 10.05
Duration [total, attack, wait] 9.947s, 9.9s, 46.921ms
Latencies [min, mean, 50, 90, 95, 99, max] 40.481ms, 49.552ms, 48.302ms, 56.069ms, 67.618ms, 76.38ms, 77.584ms
Bytes In [total, mean] 463839070, 4638390.70
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:100
Error Set:
Edited by Matthias Käppler