Skip to content

Add support for collecting memory allocator statistics

Matthias Käppler requested to merge 364346-poc-jemalloc-stats into master

What does this MR do and why?

The Ruby VM (as well as C-extensions of gems) use malloc to satisfy requests for more memory, such as when growing the Ruby heap to store more objects. When deployed via Omnibus or Charts, we do not use the GNU libc allocator, but jemalloc, an alternative malloc implementation that aims to optimize memory allocations in multi-threaded environments. The choice and configuration of the memory allocator can substantially affect application performance, but also long-term memory growth and fragmentation. In order to improve insight into how it operates, this MR adds a new Ruby interface, which can be used to:

  1. Collect memory allocator statistics and return them as a string.
  2. Write memory allocator statistics to a file.

The output format is either JSON or a tabular format meant to be human-readable. In this MR we are merely adding the basic implementation for this, the data is not yet collected anywhere.

We do not produce these stats ourselves; instead, we use a C function in the allocator library, malloc_stats_print. Since gitlab is a Ruby program, we need a bridge to make this call into C-land. Ruby ships with Fiddle, which in turn is based on libffi to do exactly this. It is a two-way bridge between Ruby and C invocations.

So at a high level, what this MR does is:

  • Map the C-call to malloc_stats_print to a Ruby function.
  • Since malloc_stats_print outputs to stderr by default, which is of limited use to us, we intercept its output buffer through a Ruby closure.
  • Finally, we return or write the output string collected this way to a file.

Risk & performance

These reports are not yet collected automatically or even available from outside the application. One must invoke these functions directly e.g. via rbtrace, or integrate them with e.g. an API endpoint or a signal handler to produce them. This means there is no immediate risk with deploying this change. In &8105 we are looking for ways to make this available in a safe manner.

Other considerations:

  • What if jemalloc is not used? Whenever libjemalloc.so is not on LD_PRELOAD (i.e. GitLab is not using it), these functions are no-ops and return nil.
  • How do these calls affect performance? Fiddle is a libffi wrapper. For the libffi function call, Ruby releases the GVL. This means we won't be blocking other Ruby threads for the duration of the native call into malloc_stats_print. Any time spent at the Ruby VM level will require the GVL, however.

As far as runtime goes, it is more interesting to look at the JSON report, since it is much larger. On my Thinkpad X1, it takes about 750ms to dump it to a file, though this is against an idle development system:

git@b23df5e55262:~/gitlab$ bundle exec rbtrace -p $(pgrep -f 'worker 0') -e 'Benchmark.bmbm { |x| x.report { Gitlab::Memory::Jemalloc.dump_stats(path: "/tmp", format: :json) } }'
*** run `sudo sysctl kernel.msgmnb=1048576` to prevent losing events (currently: 16384 bytes)
*** attached to process 191
>> Benchmark.bmbm { |x| x.report { Gitlab::Memory::Jemalloc.dump_stats(path: "/tmp", format: :json) } }
=> [#<Benchmark::Tms:0x00007f88990a5450 @label="", @real=0.7750348010013113, @cstime=0.0, @cutime=0.0, @stime=0.025372000000000172, @utime=0.7467410000000001, @total=0.7721130000000003>]
*** detached from process 191

Logs:

web_1               | Rehearsal ------------------------------------
web_1               |    0.754279   0.023787   0.778066 (  0.779778)
web_1               | --------------------------- total: 0.778066sec
web_1               | 
web_1               |        user     system      total        real
web_1               |    0.750821   0.012818   0.763639 (  0.766589)

Only in production will we be able to get meaningful data for this, but I think the ballpark here is "similar to a slow endpoint request".

Screenshots or screen recordings

Sample output (JSON): jemalloc_stats.json

To test this, libjemalloc must be on LD_PRELOAD.

Produce via:

[39] pry(main)> Gitlab::Memory::Jemalloc.dump_stats(path: '/tmp')

or via rbtrace for a Puma worker:

$ bundle exec rbtrace -p $(pgrep -f 'worker 0') -e 'pp Gitlab::Memory::Jemalloc.dump_stats(path: "/tmp")'
$ bundle exec rbtrace -p $(pgrep -f 'worker 0') -e 'pp Gitlab::Memory::Jemalloc.dump_stats(path: "/tmp", format: :text)'

-rw-r--r--. 1 git  git  1.1M Jun 13 07:32 jemalloc_stats.191.1655105543.json
-rw-r--r--. 1 git  git  372K Jun 13 07:33 jemalloc_stats.191.1655105601.txt

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #364346 (closed)

Edited by Matthias Käppler

Merge request reports