RKE2 bootstrap provider does not actually pass etcd extra args to etcd
cc: @matrohon @baburciu @tmmorin
Summary
During management cluster deployment with CABPR, we need to set specific values to 3 parameters of ETCD database startup
- quota_backend_bytes
- auto_compaction_retention
- auto_compaction_mode
Based on RKE2ControlPlane documentation, it can be done thanks to serverConfig.etcd.customConfig.extraArgs
kubectl explain RKE2ControlPlane.spec.serverConfig.etcd.customConfig.extraArgs
GROUP: controlplane.cluster.x-k8s.io
KIND: RKE2ControlPlane
VERSION: v1alpha1
FIELD: extraArgs <[]string>
DESCRIPTION:
ExtraArgs is a list of command line arguments (format: flag=value) to pass
to a Kubernetes Component command.
Steps to reproduce
RKE2Controlplane manifest
apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: RKE2ControlPlane
metadata:
annotations:
meta.helm.sh/release-name: cluster
meta.helm.sh/release-namespace: sylva-system
...
name: management-cluster-control-plane
namespace: sylva-system
...
spec:
....
replicas: 1
rolloutStrategy:
rollingUpdate:
maxSurge: 1
type: RollingUpdate
serverConfig:
cni: calico
disableComponents: {}
etcd:
backupConfig: {}
customConfig:
extraArgs:
- quota-backend-bytes=5368709120
- auto-compaction-mode=periodic
- auto-compaction-retention=5h
We use ServerConfig.etcd.customConfig.extraArgs to pass additional arguments
What is the current bug behavior?
On a ControlPlane node, ETCD binary is started with the following command-line :
etcd --config-file=/var/lib/rancher/rke2/server/db/etcd/config
And configuration file doesn't reflect parameters set into RKE2ControlPlane.
Furthermore, we can check in container logs that those 3 parameters are set to default values
{"level":"info","ts":"2024-01-12T09:51:10.065896Z","caller":"embed/etcd.go:309","msg":"starting an etcd server","etcd-version":"3.5.9","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.20.4 X:boringcrypto","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":false,"name":"management-cluster-cp-6c50a53380-8fh8b-3ae9be38","data-dir":"/var/lib/rancher/rke2/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/rke2/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://192.168.129.53:2380"],"listen-peer-urls":["https://127.0.0.1:2380","https://192.168.129.53:2380"],"advertise-client-urls":["https://192.168.129.53:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://192.168.129.53:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":[""],"host-whitelist":[""],"initial-cluster":"management-cluster-cp-6c50a53380-8fh8b-3ae9be38=https://192.168.129.53:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
- "quota-backend-bytes":2147483648 (default of 2GB)
- "auto-compaction-mode":""
- "auto-compaction-retention":"0s"
it seems that extraArgs are completly ignored
What is the expected correct behavior?
We expect rke2config for ControlPlane reflects ServerConfig section:
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha1
kind: RKE2Config
metadata:
creationTimestamp: "2024-01-12T13:09:03Z"
generation: 1
labels:
cluster.x-k8s.io/cluster-name: management-cluster
cluster.x-k8s.io/control-plane: ""
name: management-cluster-control-plane-fx9b4
namespace: sylva-system
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: Machine
name: management-cluster-control-plane-h5pwv
uid: addc2cd8-3447-4b2a-8ba7-d85a5881247d
resourceVersion: "8154"
uid: 634f9d73-3c55-4d88-9b42-13c154f1324c
spec:
agentConfig:
additionalUserData:
config: |
{}
cisProfile: cis-1.23
format: cloud-config
kubelet:
extraArgs:
- anonymous-auth=false
- provider-id=openstack:///{{ ds.meta_data.uuid }}
nodeLabels:
- sylva.org/annotate-node-from-label=true
ntp:
enabled: false
version: v1.26.9+rke2r1
files:
...
preRKE2Commands:
- echo "Preparing RKE2 bootstrap" > /var/log/my-custom-file.log
- |
OS_DISTRIBUTOR=$(lsb_release -i | awk -F ' ' '{print $3}')
if [[ "${OS_DISTRIBUTOR}" == "openSUSE" ]]; then
ASSIGN_DNS_FROM_DHCP=true
DNS_STATIC_SERVERS=_unused_
DNS_POLICY="auto"
if [[ ! "${ASSIGN_DNS_FROM_DHCP}" == "true" ]]; then
DNS_POLICY="STATIC"
sed -i "s/NETCONFIG_DNS_STATIC_SERVERS=.*/NETCONFIG_DNS_STATIC_SERVERS=${DNS_STATIC_SERVERS}/" /etc/sysconfig/network/config
fi
sed -i "s/NETCONFIG_DNS_POLICY=.*/NETCONFIG_DNS_POLICY=${DNS_POLICY}/" /etc/sysconfig/network/config
netconfig update -f
fi
- echo "fs.inotify.max_user_watches = 524288" >> /etc/sysctl.conf
- echo "fs.inotify.max_user_instances = 512" >> /etc/sysctl.conf
- sysctl --system
- export HTTP_PROXY=http://proxy.rd.francetelecom.fr:8080
- export HTTPS_PROXY=http://proxy.rd.francetelecom.fr:8080
- export NO_PROXY=127.0.0.0/8,localhost,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.sylva,.cluster.local,.cluster.local.,.svc,10.43.0.0/16,10.42.0.0/16
- echo 'alias ctr="/var/lib/rancher/rke2/bin/ctr --namespace k8s.io --address /run/k3s/containerd/containerd.sock"'
>> /root/.bashrc
- echo 'alias crictl="/var/lib/rancher/rke2/bin/crictl --runtime-endpoint /run/k3s/containerd/containerd.sock"'
>> /root/.bashrc
- echo 'alias kubectl="KUBECONFIG=/etc/rancher/rke2/rke2.yaml /var/lib/rancher/rke2/bin/kubectl"'
>> /root/.bashrc
privateRegistriesConfig: {}
Relevant logs and/or screenshots
On a control plane node, we have /etc/rancher/rke2/config.yaml
cluster-cidr: 10.42.0.0/16
cni:
- calico
kubelet-arg:
- anonymous-auth=false
- provider-id=openstack:///781887d5-1e97-444d-960e-3f57f4e569f8
node-label:
- sylva.org/annotate-node-from-label=true
profile: cis-1.23
service-cidr: 10.43.0.0/16
tls-san:
- 192.168.128.230
token: 554bddfb7990a77c8466402dba4f9a6d
$>crictl ps --name=etcd
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
c269dc0be9e86 c6b7a4f2f79b2 9 minutes ago Running etcd 0 fbca57c3a6397 etcd-management-cluster-cp-4e6dfce526-2fgg2
root@management-cluster-cp-4e6dfce526-2fgg2:/home/ubuntu#
in container:
root@management-cluster-cp-4e6dfce526-2fgg2:/home/ubuntu# crictl exec -it c269dc0be9e86 /bin/sh
I0112 13:37:30.431574 58666 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/run/k3s/containerd/containerd.sock" URL="unix:///run/k3s/containerd/containerd.sock"
sh-4.4$ ps -edf
UID PID PPID C STIME TTY TIME CMD
999 1 0 9 13:03 ? 00:03:16 etcd --config-file=/var/lib/rancher/rke2/server/db/etcd/config
999 31 0 0 13:37 pts/0 00:00:00 /bin/sh
999 37 31 62 13:37 pts/0 00:00:00 ps -edf
sh-4.4$ cat /var/lib/rancher/rke2/server/db/etcd/config
advertise-client-urls: https://192.168.129.63:2379
client-transport-security:
cert-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.crt
client-cert-auth: true
key-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.key
trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
data-dir: /var/lib/rancher/rke2/server/db/etcd
election-timeout: 5000
experimental-initial-corrupt-check: true
heartbeat-interval: 500
initial-advertise-peer-urls: https://192.168.129.63:2380
initial-cluster: management-cluster-cp-4e6dfce526-2fgg2-33bfbbcc=https://192.168.129.63:2380
initial-cluster-state: new
listen-client-urls: https://127.0.0.1:2379,https://192.168.129.63:2379
listen-metrics-urls: http://127.0.0.1:2381
listen-peer-urls: https://127.0.0.1:2380,https://192.168.129.63:2380
log-outputs:
- stderr
logger: zap
name: management-cluster-cp-4e6dfce526-2fgg2-33bfbbcc
peer-transport-security:
cert-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt
client-cert-auth: true
key-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key
trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt
snapshot-count: 10000
sh-4.4$
again no trace of configured parameters