Skip to content

limiter: Implement Cgroup resource watchers

For #5370 (closed), #5397 (closed)

To build an Adaptive Concurrency Limiting, we need to monitor resource usage and its correlation to the corresponding capacity. Both cgroup v1 and cgroup v2 are powerful in this matter. All information is accessible via cgroupfs. We can implement a poller that reads that information occasionally and converts it to a usable format. The update frequency can be tweaked later, but it should be in the order of 10 seconds.

Cgroup v2 supports semi-realtime notifications with Pressure Stall Information. This system registers a stall threshold. The OS feeds those events to Gitaly without the need for polling. Unfortunately, the support for Cgroup v2 in Gitaly is in the early stage (!5547 (merged)). PSI also needs to be activated in the kernel. So, the poller will need to be built regardless.

This MR implements two watchers: Cgroup Memory Watcher and Cgroup CPU Watcher.

  • The memory watcher returns a backoff event when the usage exceeds 90% of the memory limit or the cgroup is under OOM.
  • The CPU watcher returns a backoff event if the throttled time since the last poll exceeds 50% of the observation window.

Containerd's cgroups repository supports stats collection out of the box:

Edited by Quang-Minh Nguyen

Merge request reports