need for HelmRelease retries in bootstrap cluster

https://gitlab.com/sylva-projects/sylva-core/-/jobs/7761906694

what happens in this job is that the bootstrap fails:

  • the cluster HelmRelease is stuck:
    conditions:
    - lastTransitionTime: "2024-09-06T09:29:00Z"
      message: Failed to install after 1 attempt(s)
      observedGeneration: 1
      reason: RetriesExceeded
      status: "True"
      type: Stalled
    - lastTransitionTime: "2024-09-06T09:29:00Z"
      message: "Helm install failed for release sylva-system/cluster with chart sylva-capi-cluster@0.0.0+70529d074d69:
        5 errors occurred:\n\t* Internal error occurred: failed calling webhook \"default.metal3cluster.infrastructure.cluster.x-k8s.io\":
        failed to call webhook: Post \"https://capm3-webhook-service.capm3-system.svc:443/mutate-infrastructure-cluster-x-k8s-io-v1beta1-metal3cluster?timeout=10s\":
        dial tcp 100.96.198.18:443: connect: connection refused\n\t* Internal error
  • this seems to be due to capm3 Kustomization becoming ready before the service/pod is actually fully ready

The kustomize-units/kyverno-policies/generic/components/management-cluster-only/force-reconcile-helmreleases.yaml Kyverno would have solved this issue... but we don't yet enable it in the bootstrap cluster... we need to do that!

/cc @feleouet

Assignee Loading
Time tracking Loading