Rolling upgrade of MD failing with "Node password rejected, duplicate hostname" error
This issue has been observed on various CI runs : https://gitlab.com/sylva-projects/sylva-core/-/jobs/7290773682
The node is stuck in the cloud-init (/var/lib/cloud/instance/scripts/runcmd)
rke2-capm3-virt-management-md-0 systemd[1]: Starting Rancher Kubernetes Engine v2 (agent)...
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 sh[1267]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 sh[1268]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 kernel: [ 59.842859] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 kernel: [ 59.844141] Bridge firewalling registered
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=warning msg="cis-1.23 profile is deprecated and will be removed in v1.29. Please use cis instead."
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=info msg="Applying Pod Security Admission Configuration"
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=warning msg="cis-1.23 profile is deprecated and will be removed in v1.29. Please use cis instead."
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=info msg="Starting rke2 agent v1.28.8+rke2r1 (42cab2f61939504cb17073e47deaea0b29fe2c1b)"
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=info msg="Adding server to load balancer rke2-agent-load-balancer: 192.168.100.2:9345"
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=info msg="Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [192.168.100.2:9345] [default: 192.168.100.2:9345]"
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=info msg="Adding server to load balancer rke2-api-server-agent-load-balancer: 192.168.100.2:6443"
Jul 8 22:24:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:20Z" level=info msg="Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [192.168.100.2:6443] [default: 192.168.100.2:6443]"
Jul 8 22:24:21 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:21Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:24:31 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:31Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:24:41 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:41Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:24:52 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:24:52Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:25:02 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:25:02Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:25:12 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:25:12Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:25:20 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:25:20Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:25:28 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:25:28Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:25:34 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:25:34Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:25:43 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:25:43Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:25:51 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:25:51Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:01 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:01Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:09 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:09Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:19 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:19Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:24 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:24Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:35 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:35Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:45 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:45Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:52 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:52Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Jul 8 22:26:58 mgmt-1365192642-rke2-capm3-virt-management-md-0 rke2[1273]: time="2024-07-08T22:26:58Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
...
The error msg :
Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag
seems related to the fact that we set nodeReuse and so reuse the same hostname (BMH name) for our K8S nodes.
This issue (https://github.com/k3s-io/k3s/issues/802) mentions that the rke2 agent will use password/token stored in a K8S secret (the secret name is prefixed by node name). (https://docs.k3s.io/architecture#how-agent-node-registration-works)
kube-system mgmt-1365192642-rke2-capm3-virt-management-cp-0.node-password.rke2 Opaque 1 28m
kube-system mgmt-1365192642-rke2-capm3-virt-management-cp-1.node-password.rke2 Opaque 1 125m
kube-system mgmt-1365192642-rke2-capm3-virt-management-cp-2.node-password.rke2 Opaque 1 43m
kube-system mgmt-1365192642-rke2-capm3-virt-management-md-0.node-password.rke2 Opaque 1 56m
sylva-system mgmt-1365192642-rke2-capm3-virt-md0-9gh4h-pnghk mgmt-1365192642-rke2-capm3-virt Provisioning 57m v1.28.8+rke2r1
In our case, the secret seems to have been recreated
Logs from cp-1 (syslog):
Jul 8 23:15:51 mgmt-1365192642-rke2-capm3-virt-management-cp-1 rke2[1272]: time="2024-07-08T23:15:51Z" level=error msg="Sending HTTP 403 response to 192.168.100.21:38064: unable to verify password for node mgmt-1365192642-rke2-capm3-virt-management-md-0: hash does not match"
Jul 8 23:15:58 mgmt-1365192642-rke2-capm3-virt-management-cp-1 rke2[1272]: time="2024-07-08T23:15:58Z" level=error msg="Sending HTTP 403 response to 192.168.100.21:65257: unable to verify password for node mgmt-1365192642-rke2-capm3-virt-management-md-0: hash does not match"
Jul 8 23:16:17 mgmt-1365192642-rke2-capm3-virt-management-cp-1 rke2[1272]: time="2024-07-08T23:16:17Z" level=error msg="Sending HTTP 403 response to 192.168.100.21:45821: unable to verify password for node mgmt-1365192642-rke2-capm3-virt-management-md-0: hash does not match"
Jul 8 23:16:18 mgmt-1365192642-rke2-capm3-virt-management-cp-1 systemd[1]: run-containerd-runc-k8s.io-a5b6e432445cebe4c5832ca5dd278856d72f419723dc346f4d5bb2da1eebfae8-runc.icpnOb.mount: Deactivated successfully.
Jul 8 23:16:33 mgmt-1365192642-rke2-capm3-virt-management-cp-1 rke2[1272]: time="2024-07-08T23:16:33Z" level=error msg="Sending HTTP 403 response to 192.168.100.21:53479: unable to verify password for node mgmt-1365192642-rke2-capm3-virt-management-md-0: hash does not match"
Jul 8 23:16:51 mgmt-1365192642-rke2-capm3-virt-management-cp-1 systemd[1]: run-containerd-runc-k8s.io-fdd790c39c8664400b94b0552ee6e6fd0ce82382516045e9283e02cc5a28c685-runc.BLIoKD.mount: Deactivated successfully.
Jul 8 23:17:13 mgmt-1365192642-rke2-capm3-virt-management-cp-1 rke2[1272]: time="2024-07-08T23:17:13Z" level=error msg="Sending HTTP 403 response to 192.168.100.21:42340: unable to verify password for node mgmt-1365192642-rke2-capm3-virt-management-md-0: hash does not match"
Jul 8 23:17:18 mgmt-1365192642-rke2-capm3-virt-management-cp-1 systemd[1]: run-containerd-runc-k8s.io-a5b6e432445cebe4c5832ca5dd278856d72f419723dc346f4d5bb2da1eebfae8-runc.KAGnke.mount: Deactivated successfully.
Edited by Remi Le Trocquer