lava_dispatcher: fix failure retry on action timeout failures (!2425) · Merge requests · lava / lava

Chase Qi requested to merge failure-retry into master Mar 25, 2024

When only deploy/boot/test action level timeout is defined, the timeout is applied to all of its sub-actions. Retry action usually is just a wrapper of a set of other actions. Forcing the sub-actions to respect parent timeout breaks failure retry on action timeout error in every level.

This change allows retry actions to retry within job timeout. A common boot failure retry on bootloader/os boot timeout error will just work.

A common use case:

- boot:
    failure_retry: 3
    method: qemu
    timeout:
      minutes: 2
    media: tmpfs
    prompts:
    - "root@debian:"
    auto_login:
      login_prompt: "login:"
      username: root

A nested retry action still needs to define a named action timeout so that the timeout * retries + sleep is smaller than its parent timeout. Otherwise, the nested action may leave no time for the parent action to run when the child action passes after more than 1 retries.

A working example:

- deploy:
    failure_retry: 3
    timeout:
      minutes: 10
    timeouts:
      http-download:
        minutes: 2
    to: tmpfs
    ...

Edited Apr 25, 2024 by Chase Qi

lava_dispatcher: fix failure retry on action timeout failures

Merge request reports