Skip to content

Draft: [MCCS-1970] Fix state discovery bug

Alistair Child requested to merge mccs-1970-correct-state-discovery-bug into main

Bug description

1. Get a MccsTile in the On state.
2. turn MccsTile adminMode OFFLINE
#observer state correctly transitions to DISABLE

3.Turn MccsTile adminMode ONLINE
#observe state incorrectly stays in DISABLE

When turning the Tile OFFLINE the tile orchestrator would call stop_communicating_with_tpm and report_communication_disabled this would set the communication state DISABLED, and then call the TpmDriver stop_communicating. This would ask it to stop the polling loop (after its finished) once finished it would set its communication state to DISABLE.

  • The first issue was that the TileComponentManager would be in a the previously known state "ON/OFF" when the MccsTile was in DISABLE. So when we turned the Tile ONLINE again a call to "_report_tpm_on" calls TileComponentManager.update_component_state(power=PowerState.ON/OFF) does nothing. The fix was to simply update the TileComponentManager powerstate to UNKNOWN when we stop_communication.
  • The second issue was that, we had race conditions, for example although the TileComponentManager was told by the TileOrchestrator that that the TpmDriver is DISABLED, the TpmDriver had still not finished its poll from the command stop_communicating. This means during a call to Start_communicating it reaches:
    if self.communication_state == CommunicationStatus.ESTABLISHED:
        return

Meaning we never start communicating.

This MR attempts to fix some of these race conditions. It has been tested locally and at aavs3, seems to correct the bug.

Edited by Alistair Child

Merge request reports