Address OKD management cluster needs for static DNS entries of each workload cluster

The following discussion from !2654 (merged) should be addressed:

@baburciu started a discussion: (+1 comment)

I've learnt from @cristian.manda more about that DNS issues mentioned in !2654 (comment 2075463540) and https://sylva-projects.slack.com/archives/C04BQQANHA5/p1724679931890629, and I understand this supposes a new requirement, which is for management cluster OKD pods (from one or both of the new units, likely the okd-assisted-installer one) to be able to resolve DNS A records like:

api.<workload cluster name>.<workload cluster baseDomain>
api-int.<workload cluster name>.<workload cluster baseDomain>
apps.<workload cluster name>.<workload cluster baseDomain>

mapped to workload cluster IP used to expose kubernetes API.

In terms of -n WORKLOAD_CLUSTER_NAMESPACE HelmRelease/sylva-units values, these mappings look like

FQDN	IP
{{ printf "%s.%s.%s" "api" .Values.cluster.name .Values.cluster.okd.baseDomain }}	{{ .Values.cluster.cluster_virtual_ip }}
{{ printf "%s.%s.%s" "api-int" .Values.cluster.name .Values.cluster.okd.baseDomain }}	{{ .Values.cluster.cluster_virtual_ip }}
{{ printf "%s.%s.%s" "apps" .Values.cluster.name .Values.cluster.okd.baseDomain }}	{{ .Values.cluster.cluster_virtual_ip }}

since this .Values.cluster.okd.baseDomain is used in AgentControlPlane.spec.agentConfigSpec.baseDomain CRD definition, per s-c-c counterpart work.

Please do confirm if that's the case or explain what else if this understanding is not accurate.

Now, for such a requirement I believe we can take the below approach:

have a workload cluster specific unit create a ConfigMap with CoreDNS hosts plugin defined like:

# charts/sylva-units/workload-cluster.values.yaml
units:
  coredns-custom-hosts-import:
    enabled: false
    info:
      description: create a ConfigMap containing OKD workload cluster's DNS A records in [CoreDNS hosts plugin](https://coredns.io/plugins/hosts/)
      internal: true
    unit_templates:
    - base-deps
    enabled_conditions:
      - '{{ .Values.cluster.capi_providers.bootstrap_provider | eq "cabpob" }}'
    depends_on:
      kyverno-policies-ready: true
    repo: sylva-core
    kustomization_spec:
      path: ./kustomize-units/credentials-secret
      postBuild:
        substitute:
          CLUSTER_NAME: '{{ .Values.cluster.name }}'
          CLUSTER_VIRTUAL_IP: '{{ .Values.cluster_virtual_ip }}'
          CLUSTER_OKD_BASE_DOMAIN: '{{ .Values.cluster.okd.baseDomain }}'

using a Kustomize resource like:

# $ cat kustomize-units/credentials-secret/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- configmap.yaml

# $ cat kustomize-units/coredns-custom-hosts-import/configmap.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom-${CLUSTER_NAME}
  namespace: kube-system
  labels:
    sylva.io/coredns-custom-hosts-import: ${CLUSTER_NAME}
data:
  custom.hosts: |
    ${CLUSTER_NAME}.hosts ${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN} {
        hosts {
            ${CLUSTER_VIRTUAL_IP} api.${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN}
            ${CLUSTER_VIRTUAL_IP} api-int.${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN}
            ${CLUSTER_VIRTUAL_IP} apps.${CLUSTER_NAME}.${CLUSTER_OKD_BASE_DOMAIN}
            fallthrough
        }
        whoami
    }

We'd enable this unit for each workload cluster only (that's why the definition was directly set in workload-cluster.values.yaml, but it's just a preference, it could sit in default s-u values also).

have then a Kyverno ClusterPolicy that would follow every -n WORKLOAD_CLUSTER_NAMESPACE ConfigMap/coredns-custom-* and have the management cluster K8s resource reconciled, same approach as the one implemented by @feleouet in !1725 (merged).

For CoreDNS, this would mean to patch the Deployment/coredns (for Kubeadm) and Deployment/rke2-coredns-rke2-coredns (for RKE2) (point for future: do we have a different name in OKD? 🤔 ) by:

appending .spec.template.spec.volumes

- configMap:
    defaultMode: 420
    items:
      - key: Corefile
        path: Corefile
    name: rke2-coredns-rke2-coredns
  name: config-volume
- configMap:
    defaultMode: 420
    items:
      - key: custom.hosts
        path: custom.hosts
    name: coredns-custom-workload1
  name: coredns-custom-workload1

and appending .spec.template.spec.containers[0].volumeMounts with:

    - mountPath: /etc/coredns
      name: config-volume
    - mountPath: /etc/coredns/custom-workload1
      name: coredns-custom-workload1

We could have the Kyverno ClusterPolicy as a Kustomize Component only deployed when cabpob unit is enabled in management cluster, like:

units:
  kyverno-policies:
    :
    kustomization_spec:
      path: ./kustomize-units/kyverno-policies/generic
      :
      postBuild:
        substitute:
          :
          COREDNS_DEPLOYMENT_NAME: '{{ tuple (.Values.cluster.capi_providers.bootstrap_provider | eq "cabpk" | ternary "coredns" "rke2-coredns-rke2-coredns") (tuple . "cabpob" | include "unit-enabled") | include "set-only-if" }}'
      _components:
        :
        - '{{ tuple "components/coredns-custom-hosts-import" (tuple . "cabpob" | include "unit-enabled") | include "set-only-if" }}'

# $ cat kustomize-units/kyverno-policies/generic/components/coredns-custom-hosts-import/kustomization.yaml
---
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

resources:
- coredns-deployment-patch.yaml

# $ cat kustomize-units/kyverno-policies/generic/components/coredns-custom-hosts-import/coredns-deployment-patch.yaml
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: mount-coredns-deployment-custom-hosts
  namespace: sylva-system
  annotations:
    kustomize.toolkit.fluxcd.io/force: Enabled
    policies.kyverno.io/minversion: 1.12.0
    policies.kyverno.io/description: >-
      Patch CoreDNS Deployment volumes and volumeMounts when a new label 'sylva.io/coredns-custom-hosts-import' ConfigMap is seen
spec:
  mutateExistingOnPolicyUpdate: true
  webhookConfiguration:
    matchConditions:
    - name: is-coredns-custom-hosts-configmap
      expression: "has(object.metadata) && object.metadata.namespace == 'kube-system' && 'sylva.io/coredns-custom-hosts-import' in object.metadata.labels"
  rules:
  - name: mount-coredns-deployment-custom-hosts
    match:
      any:
      - resources:
          kinds:
          - ConfigMaps
          namespaces:
          - kube-system
    mutate:
      targets:
      - apiVersion: v1
        kind: Deployment
        name: ${COREDNS_DEPLOYMENT_NAME:-coredns}
        namespace: kube-system
      patchesJson6902: |-
        - op: add
          path: /spec/template/spec/volumes/-
          value:
            name: coredns-custom-'{{ request.object.metadata.name }}'
            configMap:
              defaultMode: 420
              items:
                - key: custom.hosts
                  path: custom.hosts
              name: coredns-custom-'{{ request.object.metadata.name }}'
        - op: add
          path: /spec/template/spec/containers/0/volumeMounts/-
          value:
            mountPath: /etc/coredns/custom-'{{ request.object.metadata.name }}'
            name: coredns-custom-'{{ request.object.metadata.name }}'

Have the coredns unit, already used in management cluster, make use of the CoreDNS import plugin to inject the static DNS entries provided by each -n WORKLOAD_CLUSTER_NAMESPACE ConfigMap/coredns-custom-*, calling it at the beginning of the Corefile:

# kustomize-units/coredns/coredns-config.yaml

data:
  Corefile: |
    import /etc/coredns/custom*/custom.hosts  # <===
    sylva:53 {
        errors
        forward ${CLUSTER_DOMAIN} ${CLUSTER_VIRTUAL_IP}
    }
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

which would effectively mean that every custom.hosts file under a directory matching the pattern /etc/coredns/custom* would have its contents injected into the actual Corefile.
And we have such files available in the CoreDNS pod file system in the way presented at previous points.

After each workload cluster creation triggered patch, the CoreDNS pods would be recycled and the OKD specific controllers, assuming they use Pod's DNS Policy as ClusterFirst, would be then able to resolve the workload cluster specific FQDNs.

I've did some testing prior to writing the above and it should theoretically work.

manual tests (click to expand)

Have custom input ConfigMaps and alter CoreDNS default CM to allow for import:

# coredns-dynamic-hosts-injection.yaml
# following notes in https://docs.digitalocean.com/products/kubernetes/how-to/customize-coredns/
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom-test-workload1
  namespace: kube-system
data:
  # log.override: |
  #   log
  # custom.server: |
  #   example.io:8053 {
  #     forward . 8.8.8.8
  #   }
  custom.hosts: |
    test-workload1.hosts test-workload1.sylva {
        hosts {
            192.168.100.3 api.test-workload1.sylva
            192.168.100.3 api-int.test-workload1.sylva
            192.168.100.3 apps.test-workload1.sylva            
            fallthrough
        }
        whoami
    }
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom-test-workload2
  namespace: kube-system
data:
  # log.override: |
  #   log
  # custom.server: |
  #   example.io:8053 {
  #     forward . 8.8.8.8
  #   }
  custom.hosts: |    
    test-workload2.hosts test-workload2.sylva {
        hosts {
            192.168.200.55 api.test-workload2.sylva
            192.168.200.55 api-int.test-workload2.sylva
            192.168.200.55 apps.test-workload2.sylva            
            fallthrough
        }
        whoami
    }    
---
apiVersion: v1
data:
  Corefile: |
    import /etc/coredns/custom*/custom.hosts
    sylva:53 {
        errors
        forward sylva 192.168.20.181
    }
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
        import /etc/coredns/custom/*.override
    }
    import /etc/coredns/custom/*.server
kind: ConfigMap
metadata:
  name: rke2-coredns-rke2-coredns
  namespace: kube-system

kubectl apply -f coredns-dynamic-hosts-injection.yaml

patch Deployment during edit with

# .spec.template.spec.volumes
- configMap:
    defaultMode: 420
    items:
      - key: Corefile
        path: Corefile
    name: rke2-coredns-rke2-coredns
  name: config-volume
- configMap:
    defaultMode: 420
    items:
      - key: custom.hosts
        path: custom.hosts
    name: coredns-custom-test-workload1
  name: coredns-custom-test-workload1
- configMap:
    defaultMode: 420
    items:
      - key: custom.hosts
        path: custom.hosts
    name: coredns-custom-test-workload2
  name: coredns-custom-test-workload2

# .spec.template.spec.containers[0].volumeMounts
    - mountPath: /etc/coredns
      name: config-volume
    - mountPath: /etc/coredns/custom-test-workload1
      name: coredns-custom-test-workload1
    - mountPath: /etc/coredns/custom-test-workload2
      name: coredns-custom-test-workload2

try from test pod

$ kubectl -n kube-system get pod -l app.kubernetes.io/name=rke2-coredns  -w
NAME                                         READY   STATUS        RESTARTS   AGE
rke2-coredns-rke2-coredns-6645b8d469-bpdbk   1/1     Terminating   0          31m
rke2-coredns-rke2-coredns-6d489f6447-8llkm   1/1     Running       0          40s
rke2-coredns-rke2-coredns-6d489f6447-jpglg   1/1     Running       0          40s
rke2-coredns-rke2-coredns-6645b8d469-bpdbk   1/1     Terminating   0          31m
rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m
rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m
rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m
rke2-coredns-rke2-coredns-6645b8d469-bpdbk   0/1     Terminating   0          31m

$ kubectl -n test-workload1 run -it network-testing --image nicolaka/netshoot:v0.13

 network-testing  ~  ping api.test-workload2.sylva
PING api.test-workload2.sylva (192.168.200.55) 56(84) bytes of data.
^C
--- api.test-workload2.sylva ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1027ms


 network-testing  ~  ping api.test-workload1.sylva
PING api.test-workload1.sylva (192.168.100.3) 56(84) bytes of data.
^C
--- api.test-workload1.sylva ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1027ms


 network-testing  ~  nslookup api.test-workload2.sylva 100.73.0.10
Server:         100.73.0.10
Address:        100.73.0.10#53

Name:   api.test-workload2.sylva
Address: 192.168.200.55


 network-testing  ~  nslookup api.test-workload1.sylva 100.73.0.10
Server:         100.73.0.10
Address:        100.73.0.10#53

Name:   api.test-workload1.sylva
Address: 192.168.100.3


 network-testing  ~ 

$ kubectl -n test-workload1 get pod network-testing -o yaml | yq .spec.dnsPolicy
ClusterFirst
$

CC: @jianzzha @ionut.spanu @tmmorin

Edited Sep 04, 2024 by Bogdan-Adrian Burciu