Skip to content

Make thresholds of CPU and Memory watchers configurable

In the current implementation, the adaptive limiting kicks in when the resource level exceeds some hard-coded thresholds:

  • 90% of the parent cgroup's memory: source.
  • Cgroup's cpu is throttled for 50% of the observation time: source.

Although the current CPU throttled threshold is reasonable, it might not be good for all cases. A more powerful machine can tolerate a higher throttling rate while a less powerful machine wants to lower the limit sooner. This commit adds the ability to customize the CPU throttled threshold.

In a recent incident, the limiter worked but it was triggered a bit late. When the memory level reaches 90%, the memory headroom might be tight. The inflight operations (usually expensive) can fill up the rest very quickly.

When the memory level reaches 100%, a lot of weird things might occur, such as high memory pressure leading to major page faults, failed memory allocations, high iowait (because of page faults), OOM killing, etc. There's a chance that inflight requests cannot finish at this stage. So, it makes sense to increase this headroom by decreasing the threshold.

Edited by Quang-Minh Nguyen

Merge request reports