Libvirt-metal Rancher-generated workload cluster kubeconfig returns invalid cert SAN

Summary

The "workload-kubeconfig" test:

    - echo "-- Testing workload cluster. Job started at '$CI_JOB_STARTED_AT'."
    # the $WORKLOAD_CLUSTER_NAME-rancher.yaml file below is the kubeconfig previously downloaded from Rancher server through Selenim
    - |
      attempts=0; max_attempts=5
      until kubectl run test-sso --image=registry.k8s.io/pause:3.9 --kubeconfig $WORKLOAD_CLUSTER_NAME-rancher.yaml --overrides='{"apiVersion": "v1","spec": {"containers": [{"name": "test","image": "registry.k8s.io/pause:3.9","securityContext": {"allowPrivilegeEscalation": false,"capabilities": {"drop": ["ALL"]},"runAsNonRoot": true,"runAsGroup": 1000,"runAsUser": 1000,"seccompProfile": {"type": "RuntimeDefault"}}}]}}'; do
        sleep 3
        ((attempts++)) && ((attempts==max_attempts)) && exit -1
      done
    - echo "-- Wait for test-sso pod to be created"
    - kubectl wait --for=condition=Ready pod/test-sso --kubeconfig $WORKLOAD_CLUSTER_NAME-rancher.yaml --timeout=60s
    - echo "-- All done"

in CI jobs

  • test-sso+workload-kubeconfig
  • test-no-sso+workload-kubeconfig

is getting failed with

-- Testing workload cluster. Job started at '2024-06-14T06:39:18Z'.
$ attempts=0; max_attempts=5 # collapsed multi-line command
E0614 06:41:58.324731    2047 memcache.go:265] couldn't get current server API group list: Get "https://rancher.172.18.0.2.nip.io/k8s/clusters/c-m-9j44t6m5/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for ingress.local, not rancher.172.18.0.2.nip.io
Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for ingress.local, not rancher.172.18.0.2.nip.io
E0614 06:42:01.370647    2070 memcache.go:265] couldn't get current server API group list: Get "https://rancher.172.18.0.2.nip.io/k8s/clusters/c-m-9j44t6m5/api?timeout=32s": tls: failed to verify certificate: x509: certificate is valid for ingress.local, not rancher.172.18.0.2.nip.io

I've seen this same error at some point and had been working on !1488 (closed) for it, but didn't notice it since.

CC: @mederic.deverdilhac @bogdan.antohe

Edit 18.07: this seems to be only present on capm3-ha-kubeadm-virt-ubuntu variant for the two jobs. Additional examples:

  • https://gitlab.com/sylva-projects/sylva-core/-/jobs/7291589435
  • https://gitlab.com/sylva-projects/sylva-core/-/jobs/7357667617
  • https://gitlab.com/sylva-projects/sylva-core/-/jobs/7279703844
  • https://gitlab.com/sylva-projects/sylva-core/-/jobs/7331190231

related references

Details

Edited Jul 18, 2024 by Bogdan-Adrian Burciu
Assignee Loading
Time tracking Loading