Change MTU size logic in sylva-core

What does this MR do and why?

This MR addresses the problem that was stated in sylva-core #1725 (closed). Right now we are setting the MTU value of Calico interfaces to be the same (except for the case where WireGuard is enabled) as the network interfaces. This will lead to conflicts and it is not considered best practice. By default, the IPPool CRD that we are using in Calico has VXLAN enabled, as it follows:

apiVersion: v1
items:
- apiVersion: crd.projectcalico.org/v1
  kind: IPPool
  metadata:
    creationTimestamp: "2024-11-13T07:41:42Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: tigera-operator
    name: default-ipv4-ippool
    resourceVersion: "1088"
    uid: 2c9934bb-08d1-4d21-b57e-bda0e2786cb2
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 100.72.0.0/16
    ipipMode: Never
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Always
kind: List
metadata:
  resourceVersion: ""

According to the Tigera official documentation, the IPv4 VXLAN uses a 50-byte header that will be substracted from the network interface MTU, which, in this case, is the same with the one that is set on the Calico interfaces. This will lead to conflicts in the calico nodes, as it can be observed currently:

2024-11-13 09:27:20.116 [WARNING][88] felix/vxlan_mgr.go 757: Failed to set vxlan tunnel device MTU error=invalid argument ipVersion=0x4
2024-11-13 09:27:30.119 [INFO][88] felix/vxlan_mgr.go 755: VXLAN device MTU needs to be updated device="vxlan.calico" ipVersion=0x4 new=1450 old=1400

In order to fix this, we need to set different MTU values, on both network interfaces and Calico interfaces. This will be done via the calico_mtu field. The logic was developed by using the official documentation from Tigera. In order to fetch the different encapsulation methods that are being used by Calico, the ipPools section was added to the values.yaml

Moving forward, the logic is quite simple, the Calico MTU is calculated from the network interfaces MTU, by substracting only the specific value of the overhead of the encapsulation method. If WireGuard is used, only the highest overhead will be substracted from the network MTU. From the Tigera documentation portal:

If you have a mix of WireGuard and either IP in IP or VXLAN in your cluster, you should configure the MTU to be the smallest of the values of each encap type.

The current implementation is developed for the case where the infra_provider is CAPM3.

Related reference(s)

Closes #1725 (closed).

Test coverage

This feature was tested on a Metal3 deployment and all the values are updated accordingly.

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2
🐧 Node OS ubuntu, suse
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging
🎬 Pipeline Scenarios Available scenario list and description
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Thomas Morin

Merge request reports

Loading