OOM helm-controller during the sylva-units-preview (bootstrap)
Job #11364387696 failed for c10a40e1:
The job fails with a timeout during the bootstrap phase while deploying sylva-units-preview. It seems that the helm-controller have restarted twice :
flux-system helm-controller-5c48699fbb-9k7ml 1/1 Running 2 (70s ago) 3m43s 100.100.0.3 bootstrap-2041384449-rke2-capo-control-plane
Events.log:
BackOff Back-off restarting failed container manager in pod helm-controller-5c48699fbb-9k7ml_flux-system
It's probably due to OOM on helm-controller (systemd.logs):
\"cd781dc4c20033aaea4ca4016da8ff0c0f4f0eaa814375334689b606c3490322\" for &ContainerMetadata{Name:manager,Attempt:1,} returns container id \"465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce\""
Sep 16 01:40:11 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:40:11.646728215Z" level=info msg="StartContainer for \"465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce\""
Sep 16 01:40:11 bootstrap-2041384449-rke2-capo-control-plane systemd[1]: Started cri-containerd-465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce.scope - libcontainer container 465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce.
Sep 16 01:40:11 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:40:11.876621534Z" level=info msg="StartContainer for \"465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce\" returns successfully"
Sep 16 01:41:36 bootstrap-2041384449-rke2-capo-control-plane systemd[1]: cri-containerd-465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce.scope: A process of this unit has been killed by the OOM killer.
Sep 16 01:41:36 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:41:36.698611385Z" level=info msg="TaskOOM event container_id:\"465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce\""
Sep 16 01:41:36 bootstrap-2041384449-rke2-capo-control-plane systemd[1]: cri-containerd-465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce.scope: Deactivated successfully.
Sep 16 01:41:36 bootstrap-2041384449-rke2-capo-control-plane systemd[1]: cri-containerd-465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce.scope: Consumed 1min 8.040s CPU time.
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane systemd[1]: run-containerd-io.containerd.runtime.v2.task-k8s.io-465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce-rootfs.mount: Deactivated successfully.
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:41:37.124691053Z" level=info msg="shim disconnected" id=465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce namespace=k8s.io
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:41:37.125455472Z" level=warning msg="cleaning up after shim disconnected" id=465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce namespace=k8s.io
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:41:37.125526030Z" level=info msg="cleaning up dead shim" namespace=k8s.io
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane kubelet[764]: I0916 01:41:37.957852 764 scope.go:117] "RemoveContainer" containerID="ecf8dbaa8a3045410c8e1b8253a3031b29fc56e18ac7b87c43412d93097a3c80"
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane kubelet[764]: I0916 01:41:37.958940 764 scope.go:117] "RemoveContainer" containerID="465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce"
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane kubelet[764]: E0916 01:41:37.960085 764 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=manager pod=helm-controller-5c48699fbb-9k7ml_flux-system(474f0766-b2f3-40fe-ac32-83f4fb218ec9)\"" pod="flux-system/helm-controller-5c48699fbb-9k7ml" podUID="474f0766-b2f3-40fe-ac32-83f4fb218ec9"
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:41:37.962350235Z" level=info msg="RemoveContainer for \"ecf8dbaa8a3045410c8e1b8253a3031b29fc56e18ac7b87c43412d93097a3c80\""
Sep 16 01:41:37 bootstrap-2041384449-rke2-capo-control-plane containerd[168]: time="2025-09-16T01:41:37.973399875Z" level=info msg="RemoveContainer for \"ecf8dbaa8a3045410c8e1b8253a3031b29fc56e18ac7b87c43412d93097a3c80\" returns successfully"
Sep 16 01:41:45 bootstrap-2041384449-rke2-capo-control-plane kubelet[764]: I0916 01:41:45.697622 764 scope.go:117] "RemoveContainer" containerID="465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce"
Sep 16 01:41:45 bootstrap-2041384449-rke2-capo-control-plane kubelet[764]: E0916 01:41:45.697900 764 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=manager pod=helm-controller-5c48699fbb-9k7ml_flux-system(474f0766-b2f3-40fe-ac32-83f4fb218ec9)\"" pod="flux-system/helm-controller-5c48699fbb-9k7ml" podUID="474f0766-b2f3-40fe-ac32-83f4fb218ec9"
Sep 16 01:41:58 bootstrap-2041384449-rke2-capo-control-plane kubelet[764]: I0916 01:41:58.859175 764 scope.go:117] "RemoveContainer" containerID="465812023f4868f88cdb10f0aa747609f8ab81842892d89041353719fa06d0ce"
We need to investigate why we have randomly this OOM (Seen twice (and only?) in