Skip to content

Draft: LOW-670: fix polled devices UNKNOWN after re-establishing comms

Alex Hill requested to merge low-670-recomm-unknown-fix into main

When a polled device fails a poll it goes into UNKNOWN state, but when it polls successfully again, the device's state remains UNKNOWN.

It happens like this:

  1. Poll fails, communication is set to NOT_ESTABLISHED
  2. Poll resumes, communication is set to ESTABLISHED
  3. We end up in SKABaseDevice._communication_state_changed, where ESTABLISHED is explicitly ignored in order to wait for a state update
  4. We get a state update, which if power is becoming ON, should finally bring the component state back to ON
  5. However we never get that power ON advice, because we never changed our power state, and BaseDevice._component_state_changed only every receives the changed bits of the component state.
  6. Device remains UNKNOWN 🥷

We can fix this by just explicitly setting the power state to UNKNOWN when poll fails. Then when we get the first state update after comms resume, _component_state_changed will receive the power state and all is well.

I don't really like this fix and suspect we will stumble upon reasons to revisit it down the track, possibly for devices which have some idea of their power state independent of communication with the component (e.g. TPMs that get their power state from a subrack, or any device that gets its power state from a PDU).

However I think it would be better to fix the real problem now and worry about the theoretical future problem in the future.

Edited by Alex Hill

Merge request reports