Ingress-nginx - restrict the number of worker-processes

What does this MR do and why?

As explained in the issue #2361 (closed), nginx is subject to OOM due to the fact that it exceeds the limit recently set at 1G.
Nginx consumes a lot of memory depending on the worker_processes parameter. This parameter is set to ‘auto’ by default. This parameter will then depend on the number of cpu on the host machine (or to the cpu limit set on the pod side) : https://nginx.org/en/docs/ngx_core_module.html#worker_processes
In the case of baremetal deployment, this can be high (> 96). This leads to high memory consumption as soon as the pod starts up (as explained in the issue #2361 (closed)). So in CI sylva we only encounter this OOM problem on the bootstrap cluster virt-capm3 and not on the management cluster because the management cluster nodes have less cpu (https://gitlab.com/sylva-projects/sylva-core/-/blob/main/environment-values/base-capm3-virt/high-availability/high-availability.yaml?ref_type=heads#L660 \

It is recommended not to exceed 24 (https://github.com/kubernetes/ingress-nginx/issues/3574#issuecomment-448229118). It is now complicated to position it correctly according to our various deployment cases (VMs, baremetals, laptops, etc.). \

8 seems to me a reasonable value for most cases (But as indicated in the documentation, this can cause performance problems.)
Another possibility would be to leave it at auto except for baremetal cases (in which case set it at 24 for example). But there would still be the case of the bootstrap cluster, which could be just about anywhere.... \

Related reference(s)

Close #2361 (closed)

Test coverage

Tested on capm3 (real baremetals) and capo.
After tuning the parameter, I ran some tests with the ab tool (apache2-utils), and I didn't see any loss in the results. However, this in no way indicates that there is no loss in a real case

worker_process "auto" (96 if i remember well) :

kubectl top pod rke2-ingress-nginx-controller-trrhp -n kube-system                                                                                                              
NAME                                  CPU(cores)   MEMORY(bytes)                                                                                                                                                   
rke2-ingress-nginx-controller-trrhp   12m          994Mi <<<

Worker_process harcoded to 8 :

kubectl top pods rke2-ingress-nginx-controller-dhgwh -n kube-system \
NAME                                  CPU(cores)   MEMORY(bytes)  \
rke2-ingress-nginx-controller-dhgwh   4m           104Mi \

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2
🐧 Node OS ubuntu, suse
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging
🎬 Pipeline Scenarios Available scenario list and description
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Remi Le Trocquer

Merge request reports

Loading