cluster-maxunavailable: don't rely only on status.unavailableReplicas
Closes sylva-projects/sylva-core#2786 (closed)
The issue described in sylva-projects/sylva-core#2786 (closed) happens because the cluster-maxunavailable controller relies on status.unavailableReplicas to determine that a node rolling update is in progress. This misses one corner case during the short time window where a Machine has been removed, but it's replacing Machine hasn't been created ; at that moment, there is no unavailable Machine (yet), and unavailableReplicas is zero -- leading today to our controller wrongly concluding that everything is ready.
The idea that it would be better to not rely on status.unavailableReplicas had been discussed in !1 (comment 2650964739), but not identified as urgent to fix ; the motivation identified then was to avoid using a deprecated field and making the controller work for clusters that use maxSurge 1.
What this MR does is:
- for MDs, as in a commit previously proposed by @feleouet, use
spec.replicas - status.availableReplicasinstead ofstatus.unavailableReplicas - for the control plane, because there is no
status.availableReplicasthat we can use, usestatus.unavailableReplicasbut if that field is zero usespec.replicas - status.readyReplicas(readyReplicas is a little less robust than availableReplicas, but this is the best we have to cover this corner case)
This MR was tested in pipelines of sylva-projects/sylva-core!5368 (closed)