Future-proof the Puma worker scaling algorithm (!6188) · Merge requests · GitLab.org / omnibus-gitlab

Matthias Käppler requested to merge mk-new-puma-worker-killer-scaler into master Jun 30, 2022

What does this MR do?

We currently determine the default Puma worker count as follows:

Take the total node RSS available
Subtract a fixed amount reserved for other services
Divide over per-worker RSS use assumption
Take the smaller of core count and this value

I think the general approach is good, but it uses hard-coded values for memory requirements that were outdated. Determining the amount of memory to reserve for other components is not straightforward and also depends on which services will run alongside Puma on a given node, making this value fairly arbitrary.

I was therefore looking to make a change that iteratively improves this by reducing the need to hard-code memory limits. This change uses a heuristic instead to decide how many Puma workers to run based on both available core count and node memory, which should work better over time.

It maintains the assumption of Puma workers using roughly 1GB of RSS, but instead:

Considers both the "memory class" (low memory vs high memory) and the ratio of RAM available compared to core count
When bound by cores, it simply uses core count
When bound by memory, it corrects using a static factor on smaller boxes (4-8GB of RAM), but lets Puma "grow into" available memory on larger boxes

I think this is more future-proof because really the only variable here is the per-worker memory assumption, and it still produces reasonable results even if that should change slightly over time.

A requirement I tried to maintain was that it actually produces the same worker counts for a number of RAM/cores combinations as the old algorithm. I did so because all our reference architectures were derived from that, so it is a much less disruptive change. Below is a table for which values this produces given a number of cores and node memory:

num_cores	mem_GB	workers
2	4	2
2	6	2
2	8	2
4	3.6	2
4	4	2
4	6	4
4	8	4
8	4	2
8	6	4
8	7.2	5
8	8	6
8	12	8
8	16	8
16	4	2
16	8	6
16	16	14
16	32	16

Source

This approach still has or maintains several drawbacks:

It is more complicated than using statically defined numbers
It does not account for memory sharing. In a future iteration, we could base the per-worker memory assumption on USS/PSS instead of RSS, but that complicates things further.
It does not account for how Puma is deployed (single-node GitLab vs dedicated Puma node)

Related issues

gitlab#334831 (closed)

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion

Required

Merge Request Title, and Description are up to date, accurate, and descriptive
MR targeting the appropriate branch
MR has a green pipeline on GitLab.com
Pipeline is green on dev.gitlab.org if the change is touching anything besides documentation or internal cookbooks
trigger-package has a green pipeline running against latest commit

Expected (please provide an explanation if not completing)

Test plan indicating conditions for success has been posted and passes
Documentation created/updated
Tests added
Integration tests added to GitLab QA
Equivalent MR/issue for the GitLab Chart opened

Future-proof the Puma worker scaling algorithm

What does this MR do?

Related issues

Checklist

Required

Expected (please provide an explanation if not completing)

Merge request reports