Improve error management
What it does
In order to limit the amount trace / log messages in case of error, the following improvements are implemented
- A retry delay (
genericRetryInterval) is applied before re-queuing for generic errors. These errors are the ones we do not know how to manage yet - The
conditions.MarkFalsefunction is no more managed by the defer at the very end of the reconciliation loop. Instead, it is managed when errors are catched.- There is now a differentiation between cluster reconciliation and cluster deletion in the different log / event / condition messages
- Some errors are processed differently from the default behavior, because we know how to process them
- When resources created by the operator are not yet ready, we requeue and wait
- When resources created by the operator are being deleted, we requeue and wait
- About tests
- They are more precise with respect to error message
- Some bugs are fixed
We follow fluxCD error management (see this example). This work will be generalized to all errors we can manage
References
Closes #34 (closed)
Tests
Local cluster deployment / deletion on an Openstack platform:
- Cluster deployment: The condition message and the logs are correct
status:
conditions:
- lastTransitionTime: "2025-04-28T13:56:43Z"
message: 'Failed to reconcile SylvaWorkloadCluster: The SylvaUnitsRelease resource
is not Ready'
observedGeneration: 1
reason: ResourceNotReady
status: "False"
type: Ready
2025-04-28T14:24:57Z INFO The SylvaUnitsRelease resource is not Ready {"controller": "sylvaworkloadcluster", "controllerGroup": "workloadclusteroperator.sylva", "controllerKind": "SylvaWorkloadCluster", "SylvaWorkloadCluster": {"name":"wc1","namespace":"teama"}, "namespace": "teama", "name": "wc1", "reconcileID": "2472f0e4-3c2c-4dbe-81b8-5586739b6437"}
2025-04-28T14:24:57Z DEBUG events Failed to reconcile SylvaWorkloadCluster wc1: The SylvaUnitsRelease resource is not Ready {"type": "Normal", "object": {"kind":"SylvaWorkloadCluster","namespace":"teama","name":"wc1","uid":"800b2e47-c361-4e11-af63-31773d3cdd98","apiVersion":"workloadclusteroperator.sylva/v1alpha1","resourceVersion":"17788980"}, "reason": "ResourceNotReady"}
- Cluster deletion: the condition message and the logs are correct
status:
conditions:
- lastTransitionTime: "2025-04-28T14:27:07Z"
message: 'Failed to delete SylvaWorkloadCluster: The SylvaUnitsRelease is being
deleted'
observedGeneration: 2
reason: ResourcePruningReason
status: "False"
type: Ready
2025-04-28T14:27:18Z INFO The SylvaUnitsRelease is being deleted {"controller": "sylvaworkloadcluster", "controllerGroup": "workloadclusteroperator.sylva", "controllerKind": "SylvaWorkloadCluster", "SylvaWorkloadCluster": {"name":"wc1","namespace":"teama"}, "namespace": "teama", "name": "wc1", "reconcileID": "c6077579-20aa-42a4-9480-ce52e62f0cf4"}
2025-04-28T14:27:18Z DEBUG events Failed to delete SylvaWorkloadCluster wc1: The SylvaUnitsRelease is being deleted {"type": "Normal", "object": {"kind":"SylvaWorkloadCluster","namespace":"teama","name":"wc1","uid":"800b2e47-c361-4e11-af63-31773d3cdd98","apiVersion":"workloadclusteroperator.sylva/v1alpha1","resourceVersion":"17920624"}, "reason": "ResourcePruningReason"}
Edited by vladimir braquet