Proactively flag potential flakiness in feature specs

It would be nice if we could identify flakiness before they fail in CI, or at least make it easier for developers to resolve them. Some ideas:

Automatically wait_for_requests

In a number of our feature specs, we have this pattern:

Login
Visit a page
Click a button
Click another button, refresh the page, or do something else

As we have seen in #384518 (closed), often times the solution is to sprinkle in a wait_for_requests call to ensure the request finishes between button clicks. Other examples:

#385660 (closed)

Perhaps we can search for such cases with some Rubocop rule, or just modify Capybara's default behavior to wait for a request to finish after a button click.

Associate all failed tests with HTTP logs and timestamps

In other cases such as #384966 (comment 1200823662), we can see race conditions that appear if an unexpected HTTP request is made. Perhaps we might be able to find flakiness by recording the HTTP requests received by the server and displaying their timestamps in the RSpec backtrace. That might go a long way to helping developers find problems.

Insert random delays in API responses

Many engineers have a hard time replicating flaky tests. But often times it's just a matter of adding delays in API or backend responses. Perhaps in local tests we ought to consider adding random delays to see if tests fail locally.

Randomly increase PostgreSQL sequence numbers

Some order-dependent tests don't fail until enough other specs have run. But perhaps we need to consider randomly increasing sequence numbers for primary IDs to ensure tests pass even though the database isn't reset into a totally clean state.

Work plan and status

	Suggestion	Work/MRs	Status/Progress
backend	Randomly increase PostgreSQL sequence numbers	Exploration	❌
documentation	Insert random delays in API responses	!107238 (merged) !107236 (merged)	✅
backend	Associate all failed tests with HTTP logs and timestamps.	Exploration	❌
documentation	Update docs to quarantine flaky test after first.	!116668 (merged)	✅
documentation	Document these cases under the right labels in Testing guide	!118306 (merged) !119379 (merged)	✅
documentation	Suggest State leakage detection on how to investigate reproduce flaky tests with timeouts and 404 not found objects.	documentation	✅
backend	Experiment with max timeout this might be changed back to 30 seconds.	!115671 (merged)	✅
backend	Forbid metadata on shared_examples and shared_context and shared examples.	!116657 (merged)	✅
backend	Add `wait_for_requests` after visit page This is the exploration phase, I would suggest to implement it and quarantining tests that surface as failing. From the experience with the one failing, they are false positives, and they should be properly fixed.	!117158 (closed) !117246 (closed) !118193 (merged)	✅
backend	Add `wait_for_all_requests` after `click_button`	!119638 (merged)	✅
backend	Add `wait_for_all_requests` after `click_link`	!120244 (merged)	✅
backend	Go back to max timeout 30 seconds after adding wait_for_requests	!120312 (closed)	✅
backend	Proposal to forbid the usage of conditionals in feature specs	!117152 (merged)	✅
documentation	Create a new label to track the flaky tests that are too slow.	I don't see more feedback on this, we can hold on for now	❌

/cc: @gitlab-org/quality/engineering-productivity

Edited May 18, 2023 by Alina Mihaila