Current keepalived/VRRP setup is not stable on OVN-based OpenStack deployments
The gateway nodes use VRRP for high availability of Wireguard, legacy load balancer and ch-k8s-lbaas services. To use VRRP within OpenStack with Floating IP, one needs to allocate a dummy Port object (which reserves an internal IP) and attach the floating IP to that port. This is what we do in https://gitlab.com/yaook/k8s/-/blob/1754445f1f70af80abd9438b51424294cd0d71d0/terraform/20-gateway.tf#L13-45 . We call that port the VIP ("virtual IP") port.
This port is left unattached and is shown as DOWN
in OpenStack (which is fine). The traffic flows through the actual gateway VM ports (https://gitlab.com/yaook/k8s/-/blob/1754445f1f70af80abd9438b51424294cd0d71d0/terraform/20-gateway.tf#L48-66), where we disable Port Security in order to be allowed to communicate using the IP address of the VIP port.
That setup worked fine in OpenStack clouds based on the older OpenvSwitch OpenStack ML2 plugin, but it doesn't work anymore with the more modern OVN ML2 plugin.
The only reliable way we know so far to make this work is to add the IP address of the VIP port to the allowed_address_pairs
of all gateway ports.
This needs to be added to the terraform files. However, it also requires that we need to enable port security. To do that, we need to reintroduce the barndoor
security group last seen in 22b3e2cd because we'll have to set it on the gateway ports (because we need to allow all kinds of traffic for the ch-k8s-lbaas implementation).
To summarize, the concrete action items for now are:
-
Add the gateway vip port's fixed IP to allowed_address_pairs
of each gateway's port -
Re-enable port security on the gateway's ports -
Add the barndoor
security group (allowing all traffic) to the gateway's ports
Then we need fixes in ch-k8s-lbaas which do the same, and those are much more complicated. I'll have to file an issue about that later.