Skip to content

Use Puma `nakayoshi_fork`

Aleksei Lipniagov requested to merge puma-nakayoshi_fork into master

What does this MR do?

  • Relates to #288042 (closed).
  • Enables nakayoshi_fork by default
  • Uses ENV var for better control of the option
  • Removes if defined check which was needed prior to Puma 5.1 migration for compatibility (!48897 (merged), #292918 (closed)) - we decided to keep it for a few more release cycles
  • Removes old nakayoshi_fork gem - https://github.com/ko1/nakayoshi_fork - The "new" nakayoshi_fork (from Puma) should also be an improvement because it uses compacting GC while doing everything else that the gem did (multiple GC cycles).

All config-related MRs for this change (for cross-reference):

Memory difference

The gem we are removing was used for memory reduction exclusively, so it should not make any logical difference in how the application behaves.

On top of that, the option we turn on in puma.rb does exactly the same (~4 GC cycles to promote the objects) + GC.compact on top of that. So, it should make the memory usage better (or keep it the same).

Also, note that we don't expect this to make a difference visible on USS/PSS until we configure jemalloc/GC params.
The explanation is to actually see any memory gains we also need to tune the memory allocator to actually return memory pages back to the OS; otherwise, the Ruby GC is simply going to keep holding on to any pages that might now have become free.

So far, I could confirm that there's no negative USS/PSS trend (in fact, it is positive in my experiments, comparing to the master branch) and no visible performance hiccups on the webserver start in Omnibus installation.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Created custom Omnibus image, tested manually.

I also verified that the GC.compact did actually happen by running

alipniagov@alipniagov-13-jan-nakayoshi:~$ curl localhost:8080/-/metrics/system | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1006    0  1006    0     0  45727      0 --:--:-- --:--:-- --:--:-- 45727
{
  "version": "ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]",
  "gc_stat": {
    "count": 106,
    "heap_allocated_pages": 11005,
    "heap_sorted_length": 11005,
    "heap_allocatable_pages": 0,
    "heap_available_slots": 4485601,
    "heap_live_slots": 2818204,
    "heap_free_slots": 1667397,
    "heap_final_slots": 0,
    "heap_marked_slots": 2737739,
    "heap_eden_pages": 11005,
    "heap_tomb_pages": 0,
    "total_allocated_pages": 11005,
    "total_freed_pages": 0,
    "total_allocated_objects": 26201088,
    "total_freed_objects": 23382884,
    "malloc_increase_bytes": 2211328,
    "malloc_increase_bytes_limit": 28546867,
    "minor_gc_count": 82,
    "major_gc_count": 24,
    "compact_count": 1,
    "remembered_wb_unprotected_objects": 41586,
    "remembered_wb_unprotected_objects_limit": 83168,
    "old_objects": 2532362,
    "old_objects_limit": 5064728,
    "oldmalloc_increase_bytes": 2211328,
    "oldmalloc_increase_bytes_limit": 85217839
  },
  "memory_rss": 921546752,
  "memory_uss": 40919040,
  "memory_pss": 333134848,
  "time_cputime": 0.447578334,
  "time_realtime": 1610632454.765385,
  "time_monotonic": 82991.512774736,
  "worker_id": "puma_0"
}

^ compact_count is 1 when the nakayoshi_fork option is set to true, 0 otherwise.

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Aleksei Lipniagov

Merge request reports