Skip to content

Proactively flag potential flakiness in feature specs

It would be nice if we could identify flakiness before they fail in CI, or at least make it easier for developers to resolve them. Some ideas:

Automatically wait_for_requests

In a number of our feature specs, we have this pattern:

  1. Login
  2. Visit a page
  3. Click a button
  4. Click another button, refresh the page, or do something else

As we have seen in #384518 (closed), often times the solution is to sprinkle in a wait_for_requests call to ensure the request finishes between button clicks. Other examples:

Perhaps we can search for such cases with some Rubocop rule, or just modify Capybara's default behavior to wait for a request to finish after a button click.

Associate all failed tests with HTTP logs and timestamps

In other cases such as #384966 (comment 1200823662), we can see race conditions that appear if an unexpected HTTP request is made. Perhaps we might be able to find flakiness by recording the HTTP requests received by the server and displaying their timestamps in the RSpec backtrace. That might go a long way to helping developers find problems.

Insert random delays in API responses

Many engineers have a hard time replicating flaky tests. But often times it's just a matter of adding delays in API or backend responses. Perhaps in local tests we ought to consider adding random delays to see if tests fail locally.

Randomly increase PostgreSQL sequence numbers

Some order-dependent tests don't fail until enough other specs have run. But perhaps we need to consider randomly increasing sequence numbers for primary IDs to ensure tests pass even though the database isn't reset into a totally clean state.

Work plan and status

Suggestion Work/MRs Status/Progress
backend Randomly increase PostgreSQL sequence numbers Exploration
documentation Insert random delays in API responses !107238 (merged) !107236 (merged)
backend Associate all failed tests with HTTP logs and timestamps. Exploration
documentation Update docs to quarantine flaky test after first. !116668 (merged)
documentation Document these cases under the right labels in Testing guide !118306 (merged) !119379 (merged)
documentation Suggest State leakage detection on how to investigate reproduce flaky tests with timeouts and 404 not found objects. documentation
backend Experiment with max timeout this might be changed back to 30 seconds. !115671 (merged)
backend Forbid metadata on shared_examples and shared_context and shared examples. !116657 (merged)
backend Add wait_for_requests after visit page This is the exploration phase, I would suggest to implement it and quarantining tests that surface as failing. From the experience with the one failing, they are false positives, and they should be properly fixed. !117158 (closed) !117246 (closed) !118193 (merged)
backend Add wait_for_all_requests after click_button !119638 (merged)
backend Add wait_for_all_requests after click_link !120244 (merged)
backend Go back to max timeout 30 seconds after adding wait_for_requests !120312 (closed)
backend Proposal to forbid the usage of conditionals in feature specs !117152 (merged)
documentation Create a new label to track the flaky tests that are too slow. I don't see more feedback on this, we can hold on for now

/cc: @gitlab-org/quality/engineering-productivity

Edited by Alina Mihaila