Skip to content

cgroup: Allow a repository to use up to M repository cgroups instead of one

For #5689 (closed)

By default, a repository can spawn processes in at most one cgroups. If the number of repositories is more than the number of cgroups (likely), multiple repositories share the same one. This model works very well if the repositories under the management of Cgroup are equivalent in size or traffic. If a node has some enormous repositories (mono-repo, for example), the scoping cgroups become excessively large compared to the rest. This imbalance situation might force the operators to lift the repository-level cgroups. As a result, the isolation effect is not effective.

Screenshot_2023-11-20_at_21.45.49 Screenshot_2023-11-20_at_21.46.04
Cgroup memory concentrated in top 1/2 cgroups while the node has 250 repository-level cgroups That cgroup is too crowded during peak traffic, leading to persistent CPU throttling
Source

More about cgroup fragmentation and imbalance in this comment: &10734 (comment 1636514645)

This MR allows a repository to use up to M repository cgroups instead of one. This feature is designed to balance resource usage between cgroups, mitigate competition for resources within a single cgroup, and enhance memory usage efficiency and isolation. The value can be adjusted based on the specific workload and number of repository cgroups on the node. A Git process uses its target repository's relative path as the hash key to find the corresponding cgroup. It is allocated randomly to any of the consequent MaxCgroupsPerRepo cgroups. It wraps around if needed.

Manual tests

Nit: the PID of the logs below is Gitaly's, not spawned Git processes. I'm too lazy to re-take those screenshots.

Default (max_cgroups_per_repo = 1)

[cgroups]
mountpoint = "/sys/fs/cgroup/"
hierarchy_root = "gitaly"
memory_bytes = 40000000000
[cgroups.repositories]
count = 10
memory_bytes = 5000000000
Logs Screenshot_2023-11-20_at_21.36.06

max_cgroups_per_repo = 2

[cgroups]
mountpoint = "/sys/fs/cgroup/"
hierarchy_root = "gitaly"
memory_bytes = 40000000000
[cgroups.repositories]
count = 10
max_cgroups_per_repo = 2
memory_bytes = 5000000000
Logs Screenshot_2023-11-20_at_21.38.43

max_cgroups_per_repo = 3 (wrap-around)

[cgroups]
mountpoint = "/sys/fs/cgroup/"
hierarchy_root = "gitaly"
memory_bytes = 40000000000
[cgroups.repositories]
count = 10
max_cgroups_per_repo = 3
memory_bytes = 5000000000
Logs Screenshot_2023-11-20_at_21.39.46

max_cgroups_per_repo = 10

[cgroups]
mountpoint = "/sys/fs/cgroup/"
hierarchy_root = "gitaly"
memory_bytes = 40000000000
[cgroups.repositories]
count = 10
max_cgroups_per_repo = 10
memory_bytes = 5000000000

This setting essentially means all the repositories share the same cgroup pool. There is no repository-level containment. Instead, it's command-level containment.

Logs Screenshot_2023-11-20_at_21.41.43
Edited by Quang-Minh Nguyen

Merge request reports