Add an healthCheckExprs to ensure that loki-secrets contains at least a tenant definition.

What does this MR do and why?

1 - Add an healthCheckExprs to ensure that loki-secrets contains at least a tenant definition.

Since secret content is a base64-encoded json string, it does not seem possible to inspect its content in CEL, so we check for its length, assuming that as soon as we'll have a tenant definition, the secret length will increase:

❯ echo '{"loki":{"tenants":[]}}' | base64 | wc -c
33

❯ echo '{"loki":{"tenants":[{}]}}' | base64 | wc -c
37

❯ echo '{"loki":{"tenants":[{"a", "b"}]}}' | base64 | wc -c
49

2 - Additionally, in a second commit, add secrets keys and their data length to debug-on-exit, as it could help debugging some specific cases like this one.

3- Finally, in a third commit, add a CleanupPolicy that will delete any empty aggregated secret:

Since we've observed that loki-aggregated-secrets was sometimes empty whereas source secrets were present, we need to workaround this corner case. This can't be done by adding a precondition (checking if tenants is empty) to the policy, since it would prevent the policy from creating loki-secrets, and there wouldn't be anything to re-trigger the policy. At the opposite, deleting the generated resource results in policy being re-triggered, that's why we introduce this cleanup policy (only on first cluster installation) to force re-generation of the empty secret.

This has been tested by inverting the policy condition (use GreaterThanOrEquals instead of LessThanOrEquals), we can see that secret is periodically deleted and re-created:

$ k get secrets loki-secrets -w
NAME           TYPE     DATA   AGE
loki-secrets   Opaque   1      97m
loki-secrets   Opaque   1      0s
loki-secrets   Opaque   1      59s
loki-secrets   Opaque   1      0s
loki-secrets   Opaque   1      59s
loki-secrets   Opaque   1      0s

Related reference(s)

Closes: #2895 (closed)

Test coverage

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2, okd, ck8s
🐧 Node OS ubuntu, suse, na, leapmicro
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging, openbao
🎬 Pipeline Scenarios Available scenario list and description
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 leapmicro

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,logging 🐧suse

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,logging 🐧ubuntu

  • ☁️ capo 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,logging 🐧suse

  • ☁️ capo 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,logging 🐧ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc,openbao🐧 suse

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 ck8s 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha,misc 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 ck8s 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2|okd 🎬 no-update 🐧 ubuntu|na

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Francois Eleouet

Merge request reports

Loading