don't block on machines that are deleting but not NodeHealthy
This MR is meant to address sylva-projects/sylva-core!6523 (comment 3107952151)
(builds on top of !110 (merged))
Context
With the refactoring done in !99 (merged), when testing its integration in sylva-core I discovered that we would have a problematic corner case: the maxunavailable controller would prevent the deletion of a Machine if the corresponding Node is node Healthy, because the NotHealthy condition on the Machine result in having less Machines that are considered ok, and the controller has no notion that the Machine which is NotHealthy is the one that would be deleted and that hence deleting it would in fact not change how many usable machines would remain ok
Solution
The solution that this MR implements is the following:
- instead of looking at how many machines are currently "usable" (ie. NodeHealthy + not-draining) and requiring to have the total number of machines usable before accepting to drain/delete one,
- the controller will now do a "what if" computation of how many "usable" machines would remain if a given Machine was deleted, to decide whether the hook can be removed in that Machine
Behavior change 1
One behavior of the controller was to never let a drain+delete occur on an MD Machine if the control-plane node rolling update wasn't finished.
Now, to solve the problem addressed by this MR, we also need to consider the following scenario:
- one MD Machine being deleting and NodeHealthy-false
- one CP Machine being deleting (and healthy)
In this situation, we have to let the controller accept the drain-delete of the MD Machine, or else we're in a deadlock because the CP Machine can't be deleted without leaving the cluster will not enough usable Machine to satisfy the cluster-maxunavailable constraint.
The solution that I opted for is simply to:
- drop this behavior (accept the deletion of a MD Machine even before the CP is up-to-date)
- but, when picking a Machine to drain+delete, favor CP Machines
This is I think overall better:
- (simpler)
- prioritizing MD Machines is not necessary (keeping in mind that it's critical for Kubernetes upgrade to respect version skew, but this is enforced by CAPI pre-flight checks)
- this behavior that I had chosen to implement was debatable in the first place, because unless we use sylva-capi-cluster
update_MDs_after_CP: trueit does not reliably work: a MachineDeployment change can trigger a deletion of an MD node before the change of XXXXControlPlane has even been processed by the XXXXControlPlane controller - if a deployment choses to use sylva-capi-cluster
update_MDs_after_CP: true, then the fact that the cluster-maxunavailabe controller give preference to deleting CP Machines should ensure that CP Machines are updated first
Behavior change 2
The controller was currently never selecting a Machine still having other pre-drain hooks.
I could have kept this change, re-implemented as a collection.Filter, but I did not do that because this criteria feel in fact like a bad idea; it's not really necessary or useful, and would possibly deadlock if some other controller was to decide to also do it. It would also lead to prefer an MD Machine if there's a CP Machine with a pre-drain hook (for instance the rke2 CAPI provider has this), which would interfere in the context of "Behavior change 1".
Again, this goes in the direction of simplifying this controller.
Implementation
I considered many implementation paths, and the most efficient to obtain something readable is to generalize the use of cluster-api collections.Filter across the controller, as started in !110 (merged). I think that the resulting code is much better (the key selectMachineToDrain function is 40 lines shorter and now below 100 lines ; perhaps we could consider closing cluster-maxunavailable: break selectMachineToDr... (#6 - closed) when merging this MR ?)
Overview of the changes made:
- implementation of the solution described above:
- introduce appropriate collection.Func filters
- combine them to determine a list of deleting Machine that can be drained
- use sorting to pick one
- simplification of getCPInfos into getCPReplicas
- with "Behavior change 1" described above, it's not necessary to fetch other informations from the CP object
- simplification of the call to
selectMachineToDrain, removing thedeletingMachinesparameter- the list of deleting Machines has to be computed anyways from the list of all machines retrieved via the non-caching client
- and can be computed efficiently with
allMachines.Filter(collections.HasDeletionTimestamp)
Review
Tests
This MR is tested via sylva-projects/sylva-core!6523 (merged)
Follow-ups
- this change will make it easier to adapt to CAPI 1.12 (because we don't need anymore to retrieve status.xxxxReplicas from the CP resource) - the kind of change done in !121 (closed) won't be necessary
- at some point I have in mind to let this controller process more than one Cluster at a time, which today is prevented (
MaxConcurrentReconciles: 1), but with the new code doing sorting to pick a machine among deleting machines, we can I think easily allow parallelism since two parallel executions processing the same cluster will take the same decision