Flaky tests management
## :warning: Important :warning:
There is a new initiative to holistically manage flaky tests in [Flaky Test Reduction Initiative: Enhancing Reliability through Visibility, Automation, and Collaboration](https://gitlab.com/groups/gitlab-org/-/epics/16187)
---
*Re-created from https://gitlab.com/groups/gitlab-org/quality/-/epics/18.*
## Problem to solve
Flaky tests cause pipeline instabilities both in MRs, `master`, and deployment pipelines.
They affect productivity and confidence in pipelines for the whole Engineering and the wider community.
## Objectives
- Improve `master` stability without manual retries
- Improve productivity:
- MR merge time - less retries needed during MWPS
- MTTP & MTTR - less retries/new pipelines during deployment cycles
- Removes doubts on whether `master` is broken or not
- Define acceptable thresholds for quarantining/addressing flaky tests
- Move towards unlocking merge train
## Plan
1. :white_check_mark: [Pre-requisite] Categorize each test with a `feature_category` metadata so that we can assign
tests to groups: https://gitlab.com/groups/gitlab-org/-/epics/8890+s
1. :white_check_mark: Avoid flaky tests before they
happen: https://gitlab.com/groups/gitlab-org/quality/engineering-productivity/-/epics/7+s
1. :white_check_mark: Provide a mechanism for fast-quarantining that doesn't require changing the
codebase: https://gitlab.com/gitlab-org/quality/engineering-productivity/team/-/issues/204+
1. :white_check_mark: & :date: Detect flaky tests as soon as possible: we already have a strategy in place
with https://docs.gitlab.com/ee/development/testing_guide/flaky_tests.html#automatic-retries-and-flaky-tests-detection
but we will probably extend it with https://gitlab.com/gitlab-org/gitlab/-/issues/361672+s
1. :white_check_mark: Track flaky tests efficiently and exhaustively:
- https://gitlab.com/groups/gitlab-org/-/epics/10536+
- https://gitlab.com/gitlab-org/gitlab/-/issues/361666+s
- https://gitlab.com/gitlab-org/gitlab/-/issues/387984+s
1. :white_check_mark: Be able to provide an overview of the flaky tests we
have: https://gitlab.com/gitlab-org/gitlab/-/issues/267487+s
1. :white_check_mark: Quarantine flaky tests in an automated way: https://gitlab.com/gitlab-org/gitlab/-/issues/442227+s
1. :white_check_mark: Ensure flaky tests are addressed by the group responsible for them: communicate broadly about the
importance of resolving flaky tests
Legend:
- :white_check_mark: work has been done
- :construction_site: : work is in progress
- :date: work is yet to be done
epic