Improve error management

What it does

In order to limit the amount trace / log messages in case of error, the following improvements are implemented

  • A retry delay (genericRetryInterval) is applied before re-queuing for generic errors. These errors are the ones we do not know how to manage yet
  • The conditions.MarkFalse function is no more managed by the defer at the very end of the reconciliation loop. Instead, it is managed when errors are catched.
    • There is now a differentiation between cluster reconciliation and cluster deletion in the different log / event / condition messages
  • Some errors are processed differently from the default behavior, because we know how to process them
    • When resources created by the operator are not yet ready, we requeue and wait
    • When resources created by the operator are being deleted, we requeue and wait
  • About tests
    • They are more precise with respect to error message
    • Some bugs are fixed

We follow fluxCD error management (see this example). This work will be generalized to all errors we can manage

References

Closes #34 (closed)

Tests

Local cluster deployment / deletion on an Openstack platform: (Sylva 1.3.9, SUR Operator 0.2.1)

  • Cluster deployment: The condition message and the logs are correct
status:
  conditions:
  - lastTransitionTime: "2025-04-28T13:56:43Z"
    message: 'Failed to reconcile SylvaWorkloadCluster: The SylvaUnitsRelease resource
      is not Ready'
    observedGeneration: 1
    reason: ResourceNotReady
    status: "False"
    type: Ready
2025-04-28T14:24:57Z    INFO    The SylvaUnitsRelease resource is not Ready     {"controller": "sylvaworkloadcluster", "controllerGroup": "workloadclusteroperator.sylva", "controllerKind": "SylvaWorkloadCluster", "SylvaWorkloadCluster": {"name":"wc1","namespace":"teama"}, "namespace": "teama", "name": "wc1", "reconcileID": "2472f0e4-3c2c-4dbe-81b8-5586739b6437"}
2025-04-28T14:24:57Z    DEBUG   events  Failed to reconcile SylvaWorkloadCluster wc1: The SylvaUnitsRelease resource is not Ready       {"type": "Normal", "object": {"kind":"SylvaWorkloadCluster","namespace":"teama","name":"wc1","uid":"800b2e47-c361-4e11-af63-31773d3cdd98","apiVersion":"workloadclusteroperator.sylva/v1alpha1","resourceVersion":"17788980"}, "reason": "ResourceNotReady"}
  • Cluster deletion: the condition message and the logs are correct
status:
  conditions:
  - lastTransitionTime: "2025-04-28T14:27:07Z"
    message: 'Failed to delete SylvaWorkloadCluster: The SylvaUnitsRelease is being
      deleted'
    observedGeneration: 2
    reason: ResourcePruningReason
    status: "False"
    type: Ready
2025-04-28T14:27:18Z    INFO    The SylvaUnitsRelease is being deleted  {"controller": "sylvaworkloadcluster", "controllerGroup": "workloadclusteroperator.sylva", "controllerKind": "SylvaWorkloadCluster", "SylvaWorkloadCluster": {"name":"wc1","namespace":"teama"}, "namespace": "teama", "name": "wc1", "reconcileID": "c6077579-20aa-42a4-9480-ce52e62f0cf4"}
2025-04-28T14:27:18Z    DEBUG   events  Failed to delete SylvaWorkloadCluster wc1: The SylvaUnitsRelease is being deleted       {"type": "Normal", "object": {"kind":"SylvaWorkloadCluster","namespace":"teama","name":"wc1","uid":"800b2e47-c361-4e11-af63-31773d3cdd98","apiVersion":"workloadclusteroperator.sylva/v1alpha1","resourceVersion":"17920624"}, "reason": "ResourcePruningReason"}
Edited by vladimir braquet

Merge request reports

Loading