need for HelmRelease retries in bootstrap cluster
https://gitlab.com/sylva-projects/sylva-core/-/jobs/7761906694
what happens in this job is that the bootstrap fails:
- the
clusterHelmRelease is stuck:
conditions:
- lastTransitionTime: "2024-09-06T09:29:00Z"
message: Failed to install after 1 attempt(s)
observedGeneration: 1
reason: RetriesExceeded
status: "True"
type: Stalled
- lastTransitionTime: "2024-09-06T09:29:00Z"
message: "Helm install failed for release sylva-system/cluster with chart sylva-capi-cluster@0.0.0+70529d074d69:
5 errors occurred:\n\t* Internal error occurred: failed calling webhook \"default.metal3cluster.infrastructure.cluster.x-k8s.io\":
failed to call webhook: Post \"https://capm3-webhook-service.capm3-system.svc:443/mutate-infrastructure-cluster-x-k8s-io-v1beta1-metal3cluster?timeout=10s\":
dial tcp 100.96.198.18:443: connect: connection refused\n\t* Internal error
- this seems to be due to capm3 Kustomization becoming ready before the service/pod is actually fully ready
The kustomize-units/kyverno-policies/generic/components/management-cluster-only/force-reconcile-helmreleases.yaml Kyverno would have solved this issue... but we don't yet enable it in the bootstrap cluster... we need to do that!
/cc @feleouet