Cattle-agent pods are not properly deployed

Summary

I've noticed in some deployments that cattle-agent pods are not deployed and stuck into CrashLoopBackOff and looking into logs seems like the cluster is not able to resolve the dns coresponding to rancher :

569dc76669-8zr75 CATTLE_RANCHER_PROVISIONING_CAPI_VERSION=105.1.0+up0.6.0 CATTLE_RANCHER_WEBHOOK_VERSION=105.0.3+up0.6.4 CATTLE_SERVER=https://rancher.sylva CATTLE_SERVER_VERSION=v2.10.3
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 100.73.0.10 options ndots:5
ERROR: https://rancher.sylva/ping is not accessible (Could not resolve host: rancher.sylva)

In order to know which ingresses are exposed by the mgmt cluster we have defined coredns unit responsible to configure the dns on the workload cluster to set the displayed ip and services that are necessary to communicate ( ex: rancher, thanos )

But even if the flux reported the unit coredns successfully applied on workload cluster, rke2 is overwriting the configmap to default values and caused the inability to resolve properly the dns and in our case blocking the importing process.

I have noticed this issue only on rke2 based deployments.

cc: @rletrocquer

related references

Details

Assignee Loading
Time tracking Loading