Sidekiq: Fork into metrics-server instead of `exec`ing
What does this MR do and why?
We recently broke out the in-process metrics server for Sidekiq into its own process. Eventually we are looking to reuse this server in Puma as well.
In the first iteration of metrics-server
(!74875 (merged), !75247 (merged)), we used Process#spawn
to fork from the parent process. That is simple, but can be inefficient because it translates into an OS level clone
that creates a new memory map, and almost no memory is shared.
We found in #347199 (closed) that even after mutating some memory regions e.g. by serving several metrics requests, unique pages drop by an order of magnitude, and ~40-50 out of 50-60MB end up being shared.
Therefore, we now fork
from sidekiq-cluster
instead of using spawn
. I kept the bin/metrics-server
script for now since it's quite useful for testing the server both in end-to-end automated tests but also manually, without having to launch a sidekiq cluster.
I also cleaned up a few unrelated issues both in tests and code that went unnoticed in the initial implementation of this. I left comments accordingly.
Memory savings
Letting sidekiq-cluster
and the server run for a while and sending multiple requests to /metrics
, we see the following results:
git@6e8215dd1600:~/gitlab$ smem -P 'sidekiq-cluster'
PID User Command Swap USS PSS RSS
199 git /usr/bin/python /usr/bin/sm 0 8632 9022 12228
69 git ruby /home/git/gitlab/bin/s 0 11028 28128 50912
71 git ruby /home/git/gitlab/bin/s 0 25792 42401 63152
PID 71 is the metrics server, fork
ed from PID 69.
I looked at the memory maps of 71 as well, and shared pages sum up to about 40MB:
71: ruby /home/git/gitlab/bin/sidekiq-cluster * -P /home/git/gitlab/tmp/pids/sidekiq-cluster.pid -e development -e development
... Size KernelPageSize MMUPageSize Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous ...
...
====== ============== =========== ===== ===== ============ ============ ============= ============= ========== =========
270960 1244 1244 61864 41122 6124 31184 0 24556 36720 55740
Shared_Clean + Shared_Dirty = 6124 + 31184 =~ 37MB
It is unclear which process has written to shared pages and to what extent that would keep happening over the life-time of those processes, so this will likely shrink over time.
Before
Comparing this to memory use on master:
git@00817cb3936d:~/gitlab$ smem -P 'sidekiq-cluster|metrics-server'
PID User Command Swap USS PSS RSS
166 git /usr/bin/python /usr/bin/sm 0 8652 8946 12352
69 git ruby /home/git/gitlab/bin/s 0 42340 43434 51028
71 git ruby /home/git/gitlab/bin/m 0 68616 69738 77500
Almost all pages in both processes are unique to the process i.e. private anon (USS = unique set size). Those pages are not shared, hence bloat physical memory use by that amount.
How to set up and validate locally
To run the server manually:
- run:
METRICS_SERVER_TARGET=sidekiq bin/metrics-server
- verify:
curl localhost:3807/metrics
The response might be empty, unless there are metrics db files left over from previous sidekiq runs.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #347199 (closed)