Skip to content

Harden Internal Events CLI specs against flakiness

Sarah Yasonik requested to merge sy-increase-resilience-of-cli-specs into master

What does this MR do and why?

  • Root problem: Timeout occurred in a spec before the example execution was complete
  • Fix: Increasing timeout duration should have no impact on test runtime when the specs are passing and reduces the flaky failures

Related issue: #435606 (closed)

All changes in this MR:

  1. Increases timeout duration for examples that gracefully close the CLI themselves (so our timeout doesn't break us out early)
  2. Separates the timeout for the whole example and the timeout for breaking out of the CLI
  3. Stops catching Timeout::Error so we don't swallow timeouts from the CLI code
  4. Chomps trailing terminal codes from end of interrupted CLI output

Why are there more changes than just increasing the timeout?

In debugging/verifying the fix, I found & fixed a couple other sources of flakiness:

  1. Rescuing Timeout::Error in the spec also rescues the error in scripts/internal_events/cli/helpers/group_ownership.rb - but this race condition isn't hit every time
  2. Depending on when the timeout breaks us out of CLI execution, there may be terminal-clearing key codes in the CLI output that the assertions didn't account for

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

How to set up and validate locally

  1. Running the specs with varying values for the example_timeout & interaction_error mimics the flakiness

Merge request reports