Flaky tests management
## :warning: Important :warning: There is a new initiative to holistically manage flaky tests in [Flaky Test Reduction Initiative: Enhancing Reliability through Visibility, Automation, and Collaboration](https://gitlab.com/groups/gitlab-org/-/epics/16187) --- *Re-created from https://gitlab.com/groups/gitlab-org/quality/-/epics/18.* ## Problem to solve Flaky tests cause pipeline instabilities both in MRs, `master`, and deployment pipelines. They affect productivity and confidence in pipelines for the whole Engineering and the wider community. ## Objectives - Improve `master` stability without manual retries - Improve productivity: - MR merge time - less retries needed during MWPS - MTTP & MTTR - less retries/new pipelines during deployment cycles - Removes doubts on whether `master` is broken or not - Define acceptable thresholds for quarantining/addressing flaky tests - Move towards unlocking merge train ## Plan 1. :white_check_mark: [Pre-requisite] Categorize each test with a `feature_category` metadata so that we can assign tests to groups: https://gitlab.com/groups/gitlab-org/-/epics/8890+s 1. :white_check_mark: Avoid flaky tests before they happen: https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/7+s 1. :white_check_mark: Provide a mechanism for fast-quarantining that doesn't require changing the codebase: https://gitlab.com/gitlab-org/quality/engineering-productivity/team/-/issues/204+ 1. :white_check_mark: & :date: Detect flaky tests as soon as possible: we already have a strategy in place with https://docs.gitlab.com/ee/development/testing_guide/flaky_tests.html#automatic-retries-and-flaky-tests-detection but we will probably extend it with https://gitlab.com/gitlab-org/gitlab/-/issues/361672+s 1. :white_check_mark: Track flaky tests efficiently and exhaustively: - https://gitlab.com/groups/gitlab-org/-/epics/10536+ - https://gitlab.com/gitlab-org/gitlab/-/issues/361666+s - https://gitlab.com/gitlab-org/gitlab/-/issues/387984+s 1. :white_check_mark: Be able to provide an overview of the flaky tests we have: https://gitlab.com/gitlab-org/gitlab/-/issues/267487+s 1. :white_check_mark: Quarantine flaky tests in an automated way: https://gitlab.com/gitlab-org/gitlab/-/issues/442227+s 1. :white_check_mark: Ensure flaky tests are addressed by the group responsible for them: communicate broadly about the importance of resolving flaky tests Legend: - :white_check_mark: work has been done - :construction_site: : work is in progress - :date: work is yet to be done
epic