Add support for ordered deletion of sylva units
Summary
Thanks to the use of flux kustomization dependencies, we are able to deploy and upgrade various units in the right order, but flux controllers won't honor these dependencies when they are deleted:
https://github.com/fluxcd/kustomize-controller/issues/301 https://github.com/fluxcd/flux2/issues/1744
Several options have been discussed here, but while re-thinking to the operator option, we figured out that it would be inneficient: adding another finaliser to kustomization would prevent their deletion from the API, but it wouldn't prevent flux controller from prunig associated resources as soon as deletionTimestam would be set.
We've also explored the use of kyverno to prevent the deletion of kustomization that still have dependents:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: kustomization-dependencies
namespace: sylva-system
annotations:
kustomize.toolkit.fluxcd.io/force: Enabled
spec:
validationFailureAction: Enforce
rules:
- name: kustomization-dependencies
match:
any:
- resources:
kinds:
- kustomize.toolkit.fluxcd.io/*/Kustomization
operations:
- DELETE
context:
- name: dependentsList
apiCall:
urlPath: "/apis/kustomize.toolkit.fluxcd.io/v1/kustomizations"
jmesPath: "items[?contains(spec.dependsOn[].name || '', '{{ request.object.metadata.name }}')] |
| [?contains(spec.dependsOn[].namespace || '', '{{ request.object.metadata.name }}') || metadata.namespace == '{{ request.object.metadata.namespace }}'].metadata.name"
validate:
message: "the kustomisation {{ request.object.metadata.name }} can not be deleted as it still has some dependents: {{ join(',' , dependentsList) }}"
deny:
conditions:
all:
- key: "{{ length(dependentsList) }}"
operator: GreaterThan
value: 0
- key: "{{ request.operation }}"
operator: Equals
value: DELETE
It works well, and blocks delete requests as expected, but as HelmRelease controller that manages sylva-units will only try to delete all kustomizations at a time and wait for them to be removed from API, it will fail in timeout and won't retry.
Finally, the best option will probably consist in adding a pre-delete hook to sylva-unit chart that will delete kustomization following their dependency
We could for example use following command in a loop identify and delete resources that have no dependents:
kubectl get ks -n sylva-system -o yaml | yq '.items as $all | [$all.[].metadata.name] - [$all.[].spec.dependsOn[] | select(. | length > 0 and (.namespace == null or .namespace == "sylva-system")) | .name] | unique | .[]'
This should probably be refined to only include units living in management cluster, as it is useless to delete resources in workload cluster that will be pruned.