Allow spawning metrics_server instead of forking into it (!80191) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge 350548-allow-spawn-metrics-server into master Feb 08, 2022

What does this MR do and why?

This is a follow-up to !78527 (merged).

Sidekiq currently forks into the metrics_server module on SaaS to serve metrics to Prometheus. This was done to leverage memory page sharing, and the parent process we fork from was already light-weight (it is a Ruby script called sidekiq-cluster).

For Puma, we currently run an in-process metrics server, running in the Puma primary. In &7304 (closed) we are looking to extract this into a separate server process, as we did for Sidekiq.

However, I found that forking into the server from the Puma primary is not desirable, for two reasons:

I found it to be less memory efficient compared to Sidekiq (see explanation below).
In the long run, we are looking to replace our Ruby exporters with a new application exporter written in Go. This means we will need to spawn a new process anyway, since forking will not be an option anymore.

This MR therefore adds a new function, MetricsServer.spawn, which executes the bin/metrics-server command instead of forking from the caller. Unfortunately, the forking variant of this method was previously called spawn, so I had to rename the old function to fork, and the non-forking variant is called spawn. Note that this function is not in active use yet outside of integration tests. This is merely paving the way to eventually spawn the server from the Puma primary in a follow-up.

Memory use

To see whether it is more efficient to fork or spawn, I looked at memory maps for both Puma and Sidekiq. What we want to focus on is the sum of unique pages across all processes in a process cluster, since this is unshared memory that will add to real memory use (RSS is very misleading in pre-fork systems, since much of the memory is shared between processes.)

Puma

We can see that puma_exporter, when forked from the primary (pid 7) accounts for 114MB of unshared memory. The rest is shared roughly proportionally with the primary (PSS). When spawned into a new process with its own memory map, puma_exporter consumes merely 14MB of unshared memory, an order of magnitude less compared to forking.

This can be explained by memory pages being dirtied by one of these processes post-fork, which triggers copy-on-write, expanding the overall memory used.

//FORK:
git@ced553165878:~/gitlab$ smem -P puma
  PID User     Command                         Swap      USS      PSS      RSS 
   75 git      puma_exporter                      0   114360   227357   494232 
    7 git      puma 5.5.2 (tcp://0.0.0.0:8        0   122920   240603   515152 
   80 git      puma: cluster worker 1: 7 [        0   349256   431325   669252 
   77 git      puma: cluster worker 0: 7 [        0   490452   570804   803872 

//SPAWN:
git@119d7d022ed2:~/gitlab$ smem -P puma
  PID User     Command                         Swap      USS      PSS      RSS 
  107 git      puma_exporter                      0    14308    35562    60996 
    7 git      puma 5.5.2 (tcp://0.0.0.0:8        0   228064   343865   569912 
   79 git      puma: cluster worker 1: 7 [        0   420212   529280   748024 
   77 git      puma: cluster worker 0: 7 [        0   421492   536181   760512

Sidekiq

For comparison, I wanted to show that for Sidekiq, we get a very different picture. This is because Sidekiq does not use a pre-fork setup (there is no "primary Sidekiq"). It also uses a parent process wrapper from which workers are spawned, which itself is just a lightweight Ruby script (pid 70 in the process listing below).

We can see here that the forking model is still better for Sidekiq, since it results in >50% fewer unique pages (37MB vs 81MB) while RSS remains the same, meaning more memory is shared:

//FORK:
git@0b0a35a4d32d:~/gitlab$ smem -P sidekiq
  PID User     Command                         Swap      USS      PSS      RSS 
   70 git      ruby /home/git/gitlab/bin/s        0    14384    39763    69220 
   72 git      sidekiq_exporter                   0    22444    46188    73168 
   74 git      sidekiq 6.4.0 queues:author        0   555560   562334   575196 

//SPAWN:
git@0ec7e1b9cc8c:~/gitlab$ smem -P sidekiq
  PID User     Command                         Swap      USS      PSS      RSS 
   70 git      ruby /home/git/gitlab/bin/s        0    58348    61123    69124 
   76 git      sidekiq_exporter                   0    23844    44819    69660 
   74 git      sidekiq 6.4.0 queues:author        0   558948   565345   578888

How to set up and validate locally

There are no material changes in behavior outside of some renaming of arguments, so no need to validate this manually.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #350548 (closed)

Edited Feb 09, 2022 by Matthias Käppler

Allow spawning metrics_server instead of forking into it