cgroup: Allow a repository to use up to M repository cgroups instead of one
Recently, when a git process is spawned, we allocate it to a fixed child repository cgroup. In a busy and large repository, the traffic might be significantly larger than other repositories on the node. When there are many repository cgroups, this allocation structure creates a serious imbalance between cgroups. This phenomenon has two consequences:
- The biggest cgroup takes most of the memory usage. Processes within the same cgroup might compete for the resource.
- When the memory usage is persistently high, we must set a high hard limit. Otherwise, the major repository fails to function. The high limit makes the protection less effective.
As a result, most of the time, the parent cgroup reaches its limits before any repository cgroup does. It leads to a high memory eviction rate, high iowait, and less effective isolation.
This issue proposes to allow one repository to allocate its processes to at most M cgroups instead of one. The change is made at this line. This change leads to some interesting results:
- It still keeps a soft containment between repositories. A repository will affect at most M neighbor repositories. We also have protecting layers, especially per-repository and per-ip limiters. It prevents a single repository from dominating all the traffic.
- It provides better distribution, the number of commands of each cgroup is balanced out between M cgroups.
- The memory usage of the biggest cgroups reduces while the smaller ones increase. It makes the imbalance less serious.
Page Caches accounting is also a good topic. Page Caches are managed by the Kernel and they are shared between processes. A page cache is charged for the cgroup that accesses it most frequently (simplified viewpoint). So, we won't face Page Caches duplication situation.
At the end of the day, we can lower the limit and boost the protection effect, and repository containment. It also solves the Page Caches issue and high reclaiming rate mentioned in &10734 (comment 1632048830).
Picking a reasonable M value is another story. A good value is 10-20% of the repository cgroup count, but it might depend on the node workload. We can set it up to 40% of the node on CNY node.