Proactively flag potential flakiness in feature specs
It would be nice if we could identify flakiness before they fail in CI, or at least make it easier for developers to resolve them. Some ideas:
Automatically wait_for_requests
In a number of our feature specs, we have this pattern:
- Login
- Visit a page
- Click a button
- Click another button, refresh the page, or do something else
As we have seen in #384518 (closed), often times the solution is to sprinkle in a wait_for_requests
call to ensure the request finishes between button clicks. Other examples:
Perhaps we can search for such cases with some Rubocop rule, or just modify Capybara's default behavior to wait for a request to finish after a button click.
Associate all failed tests with HTTP logs and timestamps
In other cases such as #384966 (comment 1200823662), we can see race conditions that appear if an unexpected HTTP request is made. Perhaps we might be able to find flakiness by recording the HTTP requests received by the server and displaying their timestamps in the RSpec backtrace. That might go a long way to helping developers find problems.
Insert random delays in API responses
Many engineers have a hard time replicating flaky tests. But often times it's just a matter of adding delays in API or backend responses. Perhaps in local tests we ought to consider adding random delays to see if tests fail locally.
Randomly increase PostgreSQL sequence numbers
Some order-dependent tests don't fail until enough other specs have run. But perhaps we need to consider randomly increasing sequence numbers for primary IDs to ensure tests pass even though the database isn't reset into a totally clean state.
Work plan and status
Suggestion | Work/MRs | Status/Progress | |
---|---|---|---|
backend | Randomly increase PostgreSQL sequence numbers | Exploration | |
documentation | Insert random delays in API responses | !107238 (merged) !107236 (merged) | |
backend | Associate all failed tests with HTTP logs and timestamps. | Exploration | |
documentation | Update docs to quarantine flaky test after first. | !116668 (merged) | |
documentation | Document these cases under the right labels in Testing guide | !118306 (merged) !119379 (merged) | |
documentation | Suggest State leakage detection on how to investigate reproduce flaky tests with timeouts and 404 not found objects. | documentation | |
backend | Experiment with max timeout this might be changed back to 30 seconds. | !115671 (merged) | |
backend | Forbid metadata on shared_examples and shared_context and shared examples. | !116657 (merged) | |
backend | Add wait_for_requests after visit page This is the exploration phase, I would suggest to implement it and quarantining tests that surface as failing. From the experience with the one failing, they are false positives, and they should be properly fixed. |
!117158 (closed) !117246 (closed) !118193 (merged) | |
backend | Add wait_for_all_requests after click_button
|
!119638 (merged) | |
backend | Add wait_for_all_requests after click_link
|
!120244 (merged) | |
backend | Go back to max timeout 30 seconds after adding wait_for_requests | !120312 (closed) | |
backend | Proposal to forbid the usage of conditionals in feature specs | !117152 (merged) | |
documentation | Create a new label to track the flaky tests that are too slow. | I don't see more feedback on this, we can hold on for now |