Optimise GC and `jemalloc` of GitLab

This is after the https://gitlab.com/gitlab-org/memory-team/team-tasks/-/issues/78.

It appears that today, we run with default GC settings and default jemalloc settings. They appear to not be tuned to GitLab, and result in ever growing memory usage.

It appears that having more aggressive GC and jemalloc, we can reduce memory footprint of GitLab by significant number.

For example, the: https://gitlab.com/gitlab-org/memory-team/team-tasks/-/issues/78#note_452730308 allowed to reduce and in fact see a memory pages being freed.

Outcome

While we learned a lot working on this (findings documented in this issue) and were able to apply those findings to smaller sub-systems such as gitlab-exporter that resulted in tangible improvements, we did not ultimately accomplish the goal: reducing memory of gitlab-rails.

Brief summary of things we tested and discarded:

jemalloc page decay: This was a successful change for gitlab-exporter which releases memory to the OS faster, but it is also a low-traffic system. For gitlab-rails, however, since this results in an overall busier system, the latency impact was found to be too great in performance tests.
RUBY_GC_HEAP_ settings: We tested several of those for their impact on memory use, but were not able to identify any that would result in noticeable memory gains without also inducing a latency hit (since tightening these settings typically results in more GC activity.) In fact, even RUBY_GC_HEAP_INIT_SLOTS, which defines the initial heap size in terms of object slots (a fairly unrisky change) we deployed but rolled back because it ended up increasing memory use, opposite to what we found during testing.

Moreover, establishing a reliable baseline against which to compare alleged improvements turned out to be hard. While we were able to make some headway in improving the stability of gpt runs and documented the approach, when rolling out even simple changes to our production fleet we found that changes had the opposite effect than when observed in the performance test suite we run. Experimenting directly in production is also hard, since it requires a substantial amount of co-ordination with infrastructure engineers to apply, observe, and perhaps roll back these changes.

We still think that tuning both the Ruby GC and the underlying memory allocator could be beneficial for at least some user cohorts of GitLab, however, we should not spend more time on this before:

We have a performance test suite that better reflects production behavior, or
We have a means to easily experiment with runtime settings in production without introducing more risk (there are some suggestions for how that could work in gitlab-com/gl-infra/scalability#154)

I also filed these considerations as #324655 (closed)

Edited Mar 15, 2021 by Matthias Käppler