Skip to content

RKE2 bootstrap provider does not actually pass etcd extra args to etcd

cc: @matrohon @baburciu @tmmorin

Summary

During management cluster deployment with CABPR, we need to set specific values to 3 parameters of ETCD database startup

  • quota_backend_bytes
  • auto_compaction_retention
  • auto_compaction_mode

Based on RKE2ControlPlane documentation, it can be done thanks to serverConfig.etcd.customConfig.extraArgs

kubectl explain RKE2ControlPlane.spec.serverConfig.etcd.customConfig.extraArgs
GROUP:      controlplane.cluster.x-k8s.io
KIND:       RKE2ControlPlane
VERSION:    v1alpha1

FIELD: extraArgs <[]string>

DESCRIPTION:
    ExtraArgs is a list of command line arguments (format: flag=value) to pass
    to a Kubernetes Component command.

Steps to reproduce

RKE2Controlplane manifest

apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: RKE2ControlPlane
metadata:
  annotations:
    meta.helm.sh/release-name: cluster
    meta.helm.sh/release-namespace: sylva-system
...
  name: management-cluster-control-plane
  namespace: sylva-system
...
spec:
....
  replicas: 1
  rolloutStrategy:
    rollingUpdate:
      maxSurge: 1
    type: RollingUpdate
  serverConfig:
    cni: calico
    disableComponents: {}
    etcd:
      backupConfig: {}
      customConfig:
        extraArgs:
        - quota-backend-bytes=5368709120
        - auto-compaction-mode=periodic
        - auto-compaction-retention=5h

We use ServerConfig.etcd.customConfig.extraArgs to pass additional arguments

What is the current bug behavior?

On a ControlPlane node, ETCD binary is started with the following command-line :

etcd --config-file=/var/lib/rancher/rke2/server/db/etcd/config

And configuration file doesn't reflect parameters set into RKE2ControlPlane.

Furthermore, we can check in container logs that those 3 parameters are set to default values

{"level":"info","ts":"2024-01-12T09:51:10.065896Z","caller":"embed/etcd.go:309","msg":"starting an etcd server","etcd-version":"3.5.9","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.20.4 X:boringcrypto","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":false,"name":"management-cluster-cp-6c50a53380-8fh8b-3ae9be38","data-dir":"/var/lib/rancher/rke2/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/rke2/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://192.168.129.53:2380"],"listen-peer-urls":["https://127.0.0.1:2380","https://192.168.129.53:2380"],"advertise-client-urls":["https://192.168.129.53:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://192.168.129.53:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":[""],"host-whitelist":[""],"initial-cluster":"management-cluster-cp-6c50a53380-8fh8b-3ae9be38=https://192.168.129.53:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
  • "quota-backend-bytes":2147483648 (default of 2GB)
  • "auto-compaction-mode":""
  • "auto-compaction-retention":"0s"

it seems that extraArgs are completly ignored

What is the expected correct behavior?

We expect rke2config for ControlPlane reflects ServerConfig section:

apiVersion: bootstrap.cluster.x-k8s.io/v1alpha1
kind: RKE2Config
metadata:
  creationTimestamp: "2024-01-12T13:09:03Z"
  generation: 1
  labels:
    cluster.x-k8s.io/cluster-name: management-cluster
    cluster.x-k8s.io/control-plane: ""
  name: management-cluster-control-plane-fx9b4
  namespace: sylva-system
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Machine
    name: management-cluster-control-plane-h5pwv
    uid: addc2cd8-3447-4b2a-8ba7-d85a5881247d
  resourceVersion: "8154"
  uid: 634f9d73-3c55-4d88-9b42-13c154f1324c
spec:
  agentConfig:
    additionalUserData:
      config: |
        {}
    cisProfile: cis-1.23
    format: cloud-config
    kubelet:
      extraArgs:
      - anonymous-auth=false
      - provider-id=openstack:///{{ ds.meta_data.uuid }}
    nodeLabels:
    - sylva.org/annotate-node-from-label=true
    ntp:
      enabled: false
    version: v1.26.9+rke2r1
  files:
...
  preRKE2Commands:
  - echo "Preparing RKE2 bootstrap" > /var/log/my-custom-file.log
  - |
    OS_DISTRIBUTOR=$(lsb_release -i | awk -F ' ' '{print $3}')
    if [[ "${OS_DISTRIBUTOR}" == "openSUSE" ]]; then
      ASSIGN_DNS_FROM_DHCP=true
      DNS_STATIC_SERVERS=_unused_
      DNS_POLICY="auto"
      if [[ ! "${ASSIGN_DNS_FROM_DHCP}" == "true" ]]; then
        DNS_POLICY="STATIC"
        sed -i "s/NETCONFIG_DNS_STATIC_SERVERS=.*/NETCONFIG_DNS_STATIC_SERVERS=${DNS_STATIC_SERVERS}/" /etc/sysconfig/network/config
      fi
        sed -i "s/NETCONFIG_DNS_POLICY=.*/NETCONFIG_DNS_POLICY=${DNS_POLICY}/" /etc/sysconfig/network/config
        netconfig update -f
    fi
  - echo "fs.inotify.max_user_watches = 524288" >> /etc/sysctl.conf
  - echo "fs.inotify.max_user_instances = 512" >> /etc/sysctl.conf
  - sysctl --system
  - export HTTP_PROXY=http://proxy.rd.francetelecom.fr:8080
  - export HTTPS_PROXY=http://proxy.rd.francetelecom.fr:8080
  - export NO_PROXY=127.0.0.0/8,localhost,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.sylva,.cluster.local,.cluster.local.,.svc,10.43.0.0/16,10.42.0.0/16
  - echo 'alias ctr="/var/lib/rancher/rke2/bin/ctr --namespace k8s.io --address /run/k3s/containerd/containerd.sock"'
    >> /root/.bashrc
  - echo 'alias crictl="/var/lib/rancher/rke2/bin/crictl --runtime-endpoint /run/k3s/containerd/containerd.sock"'
    >> /root/.bashrc
  - echo 'alias kubectl="KUBECONFIG=/etc/rancher/rke2/rke2.yaml /var/lib/rancher/rke2/bin/kubectl"'
    >> /root/.bashrc
  privateRegistriesConfig: {}

Relevant logs and/or screenshots

On a control plane node, we have /etc/rancher/rke2/config.yaml

cluster-cidr: 10.42.0.0/16                                           
cni:                                                                 
- calico                                                             
kubelet-arg:                                                         
- anonymous-auth=false                                               
- provider-id=openstack:///781887d5-1e97-444d-960e-3f57f4e569f8      
node-label:                                                          
- sylva.org/annotate-node-from-label=true                            
profile: cis-1.23                                                    
service-cidr: 10.43.0.0/16                                           
tls-san:                                                             
- 192.168.128.230                                                    
token: 554bddfb7990a77c8466402dba4f9a6d                              
$>crictl ps --name=etcd
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD
c269dc0be9e86       c6b7a4f2f79b2       9 minutes ago       Running             etcd                0                   fbca57c3a6397       etcd-management-cluster-cp-4e6dfce526-2fgg2
root@management-cluster-cp-4e6dfce526-2fgg2:/home/ubuntu#

in container:

root@management-cluster-cp-4e6dfce526-2fgg2:/home/ubuntu# crictl exec -it c269dc0be9e86 /bin/sh
I0112 13:37:30.431574   58666 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/run/k3s/containerd/containerd.sock" URL="unix:///run/k3s/containerd/containerd.sock"
sh-4.4$ ps -edf
UID          PID    PPID  C STIME TTY          TIME CMD
999            1       0  9 13:03 ?        00:03:16 etcd --config-file=/var/lib/rancher/rke2/server/db/etcd/config
999           31       0  0 13:37 pts/0    00:00:00 /bin/sh
999           37      31 62 13:37 pts/0    00:00:00 ps -edf
sh-4.4$ cat /var/lib/rancher/rke2/server/db/etcd/config
advertise-client-urls: https://192.168.129.63:2379
client-transport-security:
  cert-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.crt
  client-cert-auth: true
  key-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.key
  trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
data-dir: /var/lib/rancher/rke2/server/db/etcd
election-timeout: 5000
experimental-initial-corrupt-check: true
heartbeat-interval: 500
initial-advertise-peer-urls: https://192.168.129.63:2380
initial-cluster: management-cluster-cp-4e6dfce526-2fgg2-33bfbbcc=https://192.168.129.63:2380
initial-cluster-state: new
listen-client-urls: https://127.0.0.1:2379,https://192.168.129.63:2379
listen-metrics-urls: http://127.0.0.1:2381
listen-peer-urls: https://127.0.0.1:2380,https://192.168.129.63:2380
log-outputs:
- stderr
logger: zap
name: management-cluster-cp-4e6dfce526-2fgg2-33bfbbcc
peer-transport-security:
  cert-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt
  client-cert-auth: true
  key-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key
  trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt
snapshot-count: 10000
sh-4.4$

again no trace of configured parameters