Address OKD management cluster needs for static DNS entries of each workload cluster

The following discussion from !2654 (merged) should be addressed:

  • @baburciu started a discussion: (+1 comment)

    I've learnt from @cristian.manda more about that DNS issues mentioned in !2654 (comment 2075463540) and https://sylva-projects.slack.com/archives/C04BQQANHA5/p1724679931890629, and I understand this supposes a new requirement, which is for management cluster OKD pods (from one or both of the new units, likely the okd-assisted-installer one) to be able to resolve DNS A records like:

    api.<workload cluster name>.<workload cluster baseDomain>
    api-int.<workload cluster name>.<workload cluster baseDomain>
    apps.<workload cluster name>.<workload cluster baseDomain>

    mapped to workload cluster IP used to expose kubernetes API.

    In terms of -n WORKLOAD_CLUSTER_NAMESPACE HelmRelease/sylva-units values, these mappings look like

    FQDN IP
    {{ printf "%s.%s.%s" "api" .Values.cluster.name .Values.cluster.okd.baseDomain }} {{ .Values.cluster.cluster_virtual_ip }}
    {{ printf "%s.%s.%s" "api-int" .Values.cluster.name .Values.cluster.okd.baseDomain }} {{ .Values.cluster.cluster_virtual_ip }}
    {{ printf "%s.%s.%s" "apps" .Values.cluster.name .Values.cluster.okd.baseDomain }} {{ .Values.cluster.cluster_virtual_ip }}

    since this .Values.cluster.okd.baseDomain is used in AgentControlPlane.spec.agentConfigSpec.baseDomain CRD definition, per s-c-c counterpart work.

    Please do confirm if that's the case or explain what else if this understanding is not accurate.


    Now, for such a requirement I believe we can take the below approach:

    1. have a workload cluster specific unit create a ConfigMap with CoreDNS hosts plugin defined like:
    # charts/sylva-units/workload-cluster.values.yaml
    units:
      coredns-custom-hosts-import:
        enabled: false
        info:
          description: create a ConfigMap containing OKD workload cluster's DNS A records in [CoreDNS hosts plugin](https://coredns.io/plugins/hosts/)
          internal: true
        unit_templates:
        - base-deps
        enabled_conditions:
          - '{{ .Values.cluster.capi_providers.bootstrap_provider | eq "cabpob" }}'
        depends_on:
          kyverno-policies-ready: true
        repo: sylva-core
        kustomization_spec:
          path: ./kustomize-units/credentials-secret
          postBuild:
            substitute:
              CLUSTER_NAME: '{{ .Values.cluster.name }}'
              CLUSTER_VIRTUAL_IP: '{{ .Values.cluster_virtual_ip }}'
              CLUSTER_OKD_BASE_DOMAIN: '{{ .Values.cluster.okd.baseDomain }}'

    using a Kustomize resource like:

    # $ cat kustomize-units/credentials-secret/kustomization.yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    resources:
    - configmap.yaml
    
    # $ cat kustomize-units/coredns-custom-hosts-import/configmap.yaml
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: coredns-custom-${CLUSTER_NAME}
      namespace: kube-system
      labels:
        sylva.io/coredns-custom-hosts-import: ${CLUSTER_NAME}
    data:
      custom.hosts: |
        ${CLUSTER_NAME}.hosts ${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN} {
            hosts {
                ${CLUSTER_VIRTUAL_IP} api.${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN}
                ${CLUSTER_VIRTUAL_IP} api-int.${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN}
                ${CLUSTER_VIRTUAL_IP} apps.${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN}
                fallthrough
            }
            whoami
        }

    We'd enable this unit for each workload cluster only (that's why the definition was directly set in workload-cluster.values.yaml, but it's just a preference, it could sit in default s-u values also).

    1. have then a Kyverno ClusterPolicy that would follow every -n WORKLOAD_CLUSTER_NAMESPACE ConfigMap/coredns-custom-* and have the management cluster K8s resource reconciled, same approach as the one implemented by @feleouet in !1725 (merged).

    For CoreDNS, this would mean to patch the Deployment/coredns (for Kubeadm) and Deployment/rke2-coredns-rke2-coredns (for RKE2) (point for future: do we have a different name in OKD? 🤔 ) by:

    • appending .spec.template.spec.volumes
    - configMap:
        defaultMode: 420
        items:
          - key: Corefile
            path: Corefile
        name: rke2-coredns-rke2-coredns
      name: config-volume
    - configMap:
        defaultMode: 420
        items:
          - key: custom.hosts
            path: custom.hosts
        name: coredns-custom-workload1
      name: coredns-custom-workload1
    • and appending .spec.template.spec.containers[0].volumeMounts with:
        - mountPath: /etc/coredns
          name: config-volume
        - mountPath: /etc/coredns/custom-workload1
          name: coredns-custom-workload1

    We could have the Kyverno ClusterPolicy as a Kustomize Component only deployed when cabpob unit is enabled in management cluster, like:

    units:
      kyverno-policies:
        :
        kustomization_spec:
          path: ./kustomize-units/kyverno-policies/generic
          :
          postBuild:
            substitute:
              :
              COREDNS_DEPLOYMENT_NAME: '{{ tuple (.Values.cluster.capi_providers.bootstrap_provider | eq "cabpk" | ternary "coredns" "rke2-coredns-rke2-coredns") (tuple . "cabpob" | include "unit-enabled") | include "set-only-if" }}'
          _components:
            :
            - '{{ tuple "components/coredns-custom-hosts-import" (tuple . "cabpob" | include "unit-enabled") | include "set-only-if" }}'
    # $ cat kustomize-units/kyverno-policies/generic/components/coredns-custom-hosts-import/kustomization.yaml
    ---
    apiVersion: kustomize.config.k8s.io/v1alpha1
    kind: Component
    
    resources:
    - coredns-deployment-patch.yaml
    
    # $ cat kustomize-units/kyverno-policies/generic/components/coredns-custom-hosts-import/coredns-deployment-patch.yaml
    ---
    apiVersion: kyverno.io/v1
    kind: ClusterPolicy
    metadata:
      name: mount-coredns-deployment-custom-hosts
      namespace: sylva-system
      annotations:
        kustomize.toolkit.fluxcd.io/force: Enabled
        policies.kyverno.io/minversion: 1.12.0
        policies.kyverno.io/description: >-
          Patch CoreDNS Deployment volumes and volumeMounts when a new label 'sylva.io/coredns-custom-hosts-import' ConfigMap is seen
    spec:
      mutateExistingOnPolicyUpdate: true
      webhookConfiguration:
        matchConditions:
        - name: is-coredns-custom-hosts-configmap
          expression: "has(object.metadata) && object.metadata.namespace == 'kube-system' && 'sylva.io/coredns-custom-hosts-import' in object.metadata.labels"
      rules:
      - name: mount-coredns-deployment-custom-hosts
        match:
          any:
          - resources:
              kinds:
              - ConfigMaps
              namespaces:
              - kube-system
        mutate:
          targets:
          - apiVersion: v1
            kind: Deployment
            name: ${COREDNS_DEPLOYMENT_NAME:-coredns}
            namespace: kube-system
          patchesJson6902: |-
            - op: add
              path: /spec/template/spec/volumes/-
              value:
                name: coredns-custom-'{{ request.object.metadata.name }}'
                configMap:
                  defaultMode: 420
                  items:
                    - key: custom.hosts
                      path: custom.hosts
                  name: coredns-custom-'{{ request.object.metadata.name }}'
            - op: add
              path: /spec/template/spec/containers/0/volumeMounts/-
              value:
                mountPath: /etc/coredns/custom-'{{ request.object.metadata.name }}'
                name: coredns-custom-'{{ request.object.metadata.name }}'
    1. Have the coredns unit, already used in management cluster, make use of the CoreDNS import plugin to inject the static DNS entries provided by each -n WORKLOAD_CLUSTER_NAMESPACE ConfigMap/coredns-custom-*, calling it at the beginning of the Corefile:
    # kustomize-units/coredns/coredns-config.yaml
    
    data:
      Corefile: |
        import /etc/coredns/custom*/custom.hosts  # <===
        sylva:53 {
            errors
            forward ${CLUSTER_DOMAIN} ${CLUSTER_VIRTUAL_IP}
        }
        .:53 {
            errors
            health {
               lameduck 5s
            }
            ready
            kubernetes cluster.local in-addr.arpa ip6.arpa {
               pods insecure
               fallthrough in-addr.arpa ip6.arpa
               ttl 30
            }
            prometheus :9153
            forward . /etc/resolv.conf {
               max_concurrent 1000
            }
            cache 30
            loop
            reload
            loadbalance
        }

    which would effectively mean that every custom.hosts file under a directory matching the pattern /etc/coredns/custom* would have its contents injected into the actual Corefile.
    And we have such files available in the CoreDNS pod file system in the way presented at previous points.

    After each workload cluster creation triggered patch, the CoreDNS pods would be recycled and the OKD specific controllers, assuming they use Pod's DNS Policy as ClusterFirst, would be then able to resolve the workload cluster specific FQDNs.

    I've did some testing prior to writing the above and it should theoretically work.

    manual tests (click to expand)

    Have custom input ConfigMaps and alter CoreDNS default CM to allow for import:

    # coredns-dynamic-hosts-injection.yaml
    # following notes in https://docs.digitalocean.com/products/kubernetes/how-to/customize-coredns/
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: coredns-custom-test-workload1
      namespace: kube-system
    data:
      # log.override: |
      #   log
      # custom.server: |
      #   example.io:8053 {
      #     forward . 8.8.8.8
      #   }
      custom.hosts: |
        test-workload1.hosts test-workload1.sylva {
            hosts {
                192.168.100.3 api.test-workload1.sylva
                192.168.100.3 api-int.test-workload1.sylva
                192.168.100.3 apps.test-workload1.sylva            
                fallthrough
            }
            whoami
        }
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: coredns-custom-test-workload2
      namespace: kube-system
    data:
      # log.override: |
      #   log
      # custom.server: |
      #   example.io:8053 {
      #     forward . 8.8.8.8
      #   }
      custom.hosts: |    
        test-workload2.hosts test-workload2.sylva {
            hosts {
                192.168.200.55 api.test-workload2.sylva
                192.168.200.55 api-int.test-workload2.sylva
                192.168.200.55 apps.test-workload2.sylva            
                fallthrough
            }
            whoami
        }    
    ---
    apiVersion: v1
    data:
      Corefile: |
        import /etc/coredns/custom*/custom.hosts
        sylva:53 {
            errors
            forward sylva 192.168.20.181
        }
        .:53 {
            errors
            health {
               lameduck 5s
            }
            ready
            kubernetes cluster.local in-addr.arpa ip6.arpa {
               pods insecure
               fallthrough in-addr.arpa ip6.arpa
               ttl 30
            }
            prometheus :9153
            forward . /etc/resolv.conf {
               max_concurrent 1000
            }
            cache 30
            loop
            reload
            loadbalance
        }
            import /etc/coredns/custom/*.override
        }
        import /etc/coredns/custom/*.server
    kind: ConfigMap
    metadata:
      name: rke2-coredns-rke2-coredns
      namespace: kube-system
    kubectl apply -f coredns-dynamic-hosts-injection.yaml

    patch Deployment during edit with

    # .spec.template.spec.volumes
    - configMap:
        defaultMode: 420
        items:
          - key: Corefile
            path: Corefile
        name: rke2-coredns-rke2-coredns
      name: config-volume
    - configMap:
        defaultMode: 420
        items:
          - key: custom.hosts
            path: custom.hosts
        name: coredns-custom-test-workload1
      name: coredns-custom-test-workload1
    - configMap:
        defaultMode: 420
        items:
          - key: custom.hosts
            path: custom.hosts
        name: coredns-custom-test-workload2
      name: coredns-custom-test-workload2
    # .spec.template.spec.containers[0].volumeMounts
        - mountPath: /etc/coredns
          name: config-volume
        - mountPath: /etc/coredns/custom-test-workload1
          name: coredns-custom-test-workload1
        - mountPath: /etc/coredns/custom-test-workload2
          name: coredns-custom-test-workload2

    try from test pod

    $ kubectl -n kube-system get pod -l app.kubernetes.io/name=rke2-coredns  -w
    NAME                                         READY   STATUS        RESTARTS   AGE
    rke2-coredns-rke2-coredns-6645b8d469-bpdbk   1/1     Terminating   0          31m
    rke2-coredns-rke2-coredns-6d489f6447-8llkm   1/1     Running       0          40s
    rke2-coredns-rke2-coredns-6d489f6447-jpglg   1/1     Running       0          40s
    rke2-coredns-rke2-coredns-6645b8d469-bpdbk   1/1     Terminating   0          31m
    rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m
    rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m
    rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m
    rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m
    
    $ kubectl -n test-workload1 run -it network-testing --image nicolaka/netshoot:v0.13
    
     network-testing  ~  ping api.test-workload2.sylva
    PING api.test-workload2.sylva (192.168.200.55) 56(84) bytes of data.
    ^C
    --- api.test-workload2.sylva ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 1027ms
    
    
     network-testing  ~  ping api.test-workload1.sylva
    PING api.test-workload1.sylva (192.168.100.3) 56(84) bytes of data.
    ^C
    --- api.test-workload1.sylva ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 1027ms
    
    
     network-testing  ~  nslookup api.test-workload2.sylva 100.73.0.10
    Server:         100.73.0.10
    Address:        100.73.0.10#53
    
    Name:   api.test-workload2.sylva
    Address: 192.168.200.55
    
    
     network-testing  ~  nslookup api.test-workload1.sylva 100.73.0.10
    Server:         100.73.0.10
    Address:        100.73.0.10#53
    
    Name:   api.test-workload1.sylva
    Address: 192.168.100.3
    
    
     network-testing  ~ 
    
    $ kubectl -n test-workload1 get pod network-testing -o yaml | yq .spec.dnsPolicy
    ClusterFirst
    $

CC: @jianzzha @ionut.spanu @tmmorin

Edited Sep 04, 2024 by Bogdan-Adrian Burciu
Assignee Loading
Time tracking Loading