BaremetalHost.spec.online set to false by helm controller after pivot

Summary

While troubleshooting failures observed while upgrading flux from 2.6.4 to 2.7.5 in !5717 (merged), we noticed that helm controller was setting online=false when cluster-bmh HelmRelease adopts the BaremetalHosts in management-cluster after pivot, resulting to cluster shutdown...

It's not clear yet why we were not observing this issue in current release, since we've not been able to reproduce any behavior change in helm controller (between 1.3.0 and 1.4.5): when it adopts existing resources, it performs an helm install and overwrites all the resources fields defined in the chart, drift detection configuration has no impact as it only applies to upgrades when there is an existing release installed.

It seems preferable to stop configuring BaremetalHost.spec.online field from setting the online field, since we only want to provide an initial value to that field, and pass the relay to metal3Machine controller after that.

As discussed in !5717 (comment 3004657767), we can instead:

  • prevent users from setting bmc_spec.online field using schema
  • rely on a kyverno policy to set the initial flag value, optionally controlled by an annotation
Assignee Loading
Time tracking Loading