Ingress-nginx - restrict the number of worker-processes
What does this MR do and why?
As explained in the issue #2361 (closed), nginx is subject to OOM due to the fact that it exceeds the limit recently set at 1G.
Nginx consumes a lot of memory depending on the worker_processes parameter. This parameter is set to ‘auto’ by default. This parameter will then depend on the number of cpu on the host machine (or to the cpu limit set on the pod side) : https://nginx.org/en/docs/ngx_core_module.html#worker_processes
In the case of baremetal deployment, this can be high (> 96). This leads to high memory consumption as soon as the pod starts up (as explained in the issue #2361 (closed)). So in CI sylva we only encounter this OOM problem on the bootstrap cluster virt-capm3 and not on the management cluster because the management cluster nodes have less cpu (https://gitlab.com/sylva-projects/sylva-core/-/blob/main/environment-values/base-capm3-virt/high-availability/high-availability.yaml?ref_type=heads#L660 \
It is recommended not to exceed 24 (https://github.com/kubernetes/ingress-nginx/issues/3574#issuecomment-448229118). It is now complicated to position it correctly according to our various deployment cases (VMs, baremetals, laptops, etc.). \
8 seems to me a reasonable value for most cases (But as indicated in the documentation, this can cause performance problems.)
Another possibility would be to leave it at auto except for baremetal cases (in which case set it at 24 for example). But there would still be the case of the bootstrap cluster, which could be just about anywhere.... \
Related reference(s)
Close #2361 (closed)
Test coverage
Tested on capm3 (real baremetals) and capo.
After tuning the parameter, I ran some tests with the ab tool (apache2-utils), and I didn't see any loss in the results. However, this in no way indicates that there is no loss in a real case
worker_process "auto" (96 if i remember well) :
kubectl top pod rke2-ingress-nginx-controller-trrhp -n kube-system
NAME CPU(cores) MEMORY(bytes)
rke2-ingress-nginx-controller-trrhp 12m 994Mi <<<
Worker_process harcoded to 8 :
kubectl top pods rke2-ingress-nginx-controller-dhgwh -n kube-system \
NAME CPU(cores) MEMORY(bytes) \
rke2-ingress-nginx-controller-dhgwh 4m 104Mi \
CI configuration
Below you can choose test deployment variants to run in this MR's CI.
Click to open to CI configuration
Legend:
| Icon | Meaning | Available values |
|---|---|---|
| Infra Provider |
capd, capo, capm3
|
|
| Bootstrap Provider |
kubeadm (alias kadm), rke2
|
|
| Node OS |
ubuntu, suse
|
|
| Deployment Options |
light-deploy, dev-sources, ha, misc, maxsurge-0, logging
|
|
| Pipeline Scenarios | Available scenario list and description |
-
🎬 preview☁️ capd🚀 kadm🐧 ubuntu -
🎬 preview☁️ capo🚀 rke2🐧 suse -
🎬 preview☁️ capm3🚀 rke2🐧 ubuntu -
☁️ capd🚀 kadm🛠️ light-deploy🐧 ubuntu -
☁️ capd🚀 rke2🛠️ light-deploy🐧 suse -
☁️ capo🚀 rke2🐧 suse -
☁️ capo🚀 kadm🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capo🚀 kadm🎬 wkld-k8s-upgrade🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update-no-wkld🛠️ ha,misc🐧 suse -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.3.x🛠️ ha,misc🐧 ubuntu -
☁️ capm3🚀 rke2🐧 suse -
☁️ capm3🚀 kadm🐧 ubuntu -
☁️ capm3🚀 kadm🎬 rolling-update-no-wkld🛠️ ha,misc🐧 ubuntu -
☁️ capm3🚀 rke2🎬 wkld-k8s-upgrade🛠️ ha🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.3.x🛠️ misc,ha🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 suse
Global config for deployment pipelines
-
autorun pipelines -
allow failure on pipelines -
record sylvactl events
Notes:
- Enabling
autorunwill make deployment pipelines to be run automatically without human interaction - Disabling
allow failurewill make deployment pipelines mandatory for pipeline success. - if both
autorunandallow failureare disabled, deployment pipelines will need manual triggering but will be blocking the pipeline
Be aware: after configuration change, pipeline is not triggered automatically.
Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.