Commit 8bdb2628 authored by David Dieulivol's avatar David Dieulivol 💬
Browse files

Clean up flaky tests handbook

parent fc4ca92f
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -137,7 +137,7 @@ our established request process:
#### Test Health and Pipeline Stability

- [Flaky Tests](flaky-tests/_index.md) - Automated detection and reporting
  - [Automated Reporting for the most flaky tests](flaky-tests/_index.md#automated-reporting-of-top-flaky-test-files) - Weekly assignments for high-impact flaky tests
  - [Reporting of Top Flaky Test Files](flaky-tests/_index.md#reporting-of-top-flaky-test-files) - Weekly assignments for high-impact flaky tests
- [Product Engineer guide to E2E test failure issues](guide-to-e2e-test-failure-issues.md)
- [Unhealthy Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/unhealthy_tests/) - Technical debugging reference for GitLab contributors
- [🪄 Debug MR Test Failures with Duo](using-duo-to-debug-test-failures-in-mrs.md) - Use Duo to quickly diagnose and fix test failures in your MR
+19 −47
Original line number Diff line number Diff line
@@ -4,40 +4,17 @@ title: "Flaky tests"

## Introduction

This page describes GitLab's organizational process for detecting, reporting, and managing flaky tests. For technical guidance on debugging and fixing flaky tests, see [Unhealthy Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/unhealthy_tests/). For quarantine procedures and syntax, see [Quarantining Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/quarantining_tests/) and [Quarantine Process](../quarantine-process/).
This page describes GitLab's organizational process for detecting, reporting, and managing flaky tests. For technical guidance on debugging and fixing flaky tests, see [Unhealthy Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/unhealthy_tests/). For quarantine procedures and syntax, see [Quarantine Process (Handbook)](../quarantine-process/) and [Quarantining Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/quarantining_tests/).

A flaky test is an unreliable test that occasionally fails but passes eventually if you retry it enough times. Flaky tests can be a result of brittle tests, unstable test infrastructure, or an unstable application. We should try to identify the cause and remove the instability to improve quality and build trust in test results.

### Manual flow to detect flaky tests

When a flaky test fails in an MR, the author might follow the following flow:

```mermaid
graph LR
    A[Test fails in a MR] --> C{Does the failure looks related to the MR?}
    C -->|Yes| D[Try to reproduce and fix the test locally]
    C -->|No| E{Does a flaky test issue exists?}
    E -->|Yes| F[Retry the job and hope that it will pass this time]
    E -->|No| G[Wonder if this is flaky and retry the job]
```

## Why is flaky tests management important?

- Flaky tests undermine test results, leading to engineers disregarding test failures as flaky.
- Manual retries to try to get flaky tests to pass, and the effort needed to investigate flaky tests as failures are a significant waste of time.
- Managing flaky tests by quickly fixing the cause or removing the test from the test suite allows test time and costs to be used where they add value.

## Urgency Tiers and Response Timelines

Flaky tests are categorized by urgency based on their impact on pipeline stability:

- 🔴 **Critical**: 48 hours - Tests blocking critical workflows or affecting multiple teams
- 🟠 **High**: 1 week - Tests with significant pipeline impact
- 🟡 **Medium**: 2 weeks - Tests with moderate impact

These timelines guide when a test should be quarantined if it cannot be fixed. For quarantine procedures and technical implementation, see [Quarantine Process](../quarantine-process/) and [Quarantining Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/quarantining_tests/).

## Automated Reporting of Top Flaky Test Files
## Reporting of Top Flaky Test Files

GitLab uses custom tooling to automatically identify and report the most impactful flaky test files that block CI/CD pipelines. The [ci-alerts automation](https://gitlab.com/gitlab-org/quality/analytics/ci-alerts) creates issues for test files causing repeated pipeline failures, which are then triaged and assigned to Engineering Managers for resolution.

@@ -53,6 +30,10 @@ The ci-alerts system analyzes test failure data from ClickHouse to identify test

For detailed information about the classification algorithm and configuration, see the [ci-alerts flaky tests reporting documentation](https://gitlab.com/gitlab-org/quality/analytics/ci-alerts/-/blob/main/doc/flaky_tests_reporting.md).

### Frequency

We create top flaky tests issues weekly (Sundays at 10:00 UTC)

### Triage Process

Issues created by the automation are triaged by the Development Analytics team and dispatched to the responsible Engineering Managers. The complete triage workflow is documented in the [ci-alerts TRIAGE.md](https://gitlab.com/gitlab-org/quality/analytics/ci-alerts/-/blob/main/TRIAGE.md).
@@ -61,7 +42,6 @@ Issues created by the automation are triaged by the Development Analytics team a

1. Initial triage to verify genuine flakiness
2. Dispatch to responsible product group with EM mention
3. 14-day follow-up with quarantine option if no action taken

### For Engineering Managers

@@ -69,12 +49,19 @@ If you've been assigned a top flaky test file issue:

1. **Review the issue description** - Contains impact metrics, Grafana dashboard link, and recommended actions
2. **Assess the situation** - Use the Grafana dashboard to understand failure patterns
3. **Take action within 14 days:**
   - Fix the root cause, or
   - Merge the provided quarantine MR to unblock pipelines while investigating, or
   - Request more time if actively working on a fix
3. **Take action** - See [Urgency Tiers and Response Timelines](#urgency-tiers-and-response-timelines) for timeline guidance

For guidance on quarantining tests, see the [Quarantine Process (Handbook)](../quarantine-process/) and [Quarantining Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/quarantining_tests/).

For guidance on quarantining tests, see the [Quarantine Process](../quarantine-process/) and [Quarantining Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/quarantining_tests/).
### Urgency Tiers and Response Timelines

Flaky tests are categorized by urgency based on their impact on pipeline stability:

- 🔴 **Critical**: 48 hours - Tests blocking critical workflows or affecting multiple teams
- 🟠 **High**: 1 week - Tests with significant pipeline impact
- 🟡 **Medium**: 2 weeks - Tests with moderate impact

These timelines guide when a test should be quarantined if it cannot be fixed. For quarantine procedures and technical implementation, see [Quarantine Process (Handbook)](../quarantine-process/) and [Quarantining Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/quarantining_tests/).

### What About Other Flaky Test Reporting Systems?

@@ -96,24 +83,9 @@ You may have noticed older flaky test issues with `flakiness::*` labels (e.g., `

The two systems will eventually be merged into a unified reporting mechanism.

### Technical Details

For developers and automation maintainers:

- **Source code:** [gitlab-org/quality/analytics/ci-alerts](https://gitlab.com/gitlab-org/quality/analytics/ci-alerts)
- **Classification algorithm:** [doc/flaky_tests_reporting.md](https://gitlab.com/gitlab-org/quality/analytics/ci-alerts/-/blob/main/doc/flaky_tests_reporting.md)
- **Triage workflow:** [TRIAGE.md](https://gitlab.com/gitlab-org/quality/analytics/ci-alerts/-/blob/main/TRIAGE.md)
- **Schedule:** Runs weekly on Sundays at 10:00 UTC

### Getting Help

For questions or support:

- **Slack:** [#g_development_analytics](https://gitlab.enterprise.slack.com/archives/C064M4D2V37)

## Additional resources

- [Detailed Quarantine Process](../quarantine-process.md) - Overall process for quarantined tests at GitLab
- [Quarantine Process (Handbook)](../quarantine-process/) - Overall process for quarantined tests at GitLab
- [Unhealthy Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/unhealthy_tests/) - Technical reference for debugging and reproducing flaky tests
- [Quarantining Tests (Developer Docs)](https://docs.gitlab.com/development/testing_guide/quarantining_tests/) - Technical reference for quarantine syntax and implementation
- [Flaky tests dashboard](https://dashboards.devex.gitlab.net/d/ddjwrqc/flaky-tests-overview)
+1 −1
Original line number Diff line number Diff line
@@ -316,7 +316,7 @@ For more details, see the list with example issues in our

The most impactful flaky tests are automatically detected and reported directly to the Engineering Manager of the team which owns the `feature_category` of the test.

See [Automated Reporting of Top Flaky Test Files](flaky-tests/_index.md#automated-reporting-of-top-flaky-test-files).
See [Reporting of Top Flaky Test Files](flaky-tests/_index.md#reporting-of-top-flaky-test-files).

To see which tests already have been identified as a top flaky test, view all [top flaky test file issues](https://gitlab.com/gitlab-org/quality/test-failure-issues/-/issues?sort=created_date&state=opened&label_name%5B%5D=automation%3Atop-flaky-test-file&first_page_size=100) in the `test-failure-issues` project.

+2 −2
Original line number Diff line number Diff line
@@ -55,7 +55,7 @@ Tests are identified for quarantine through automated detection and manual ident

### Automated processes

1. **Automated Flaky Test Reporting** (recommended) - Identifies the most impactful flaky test files and creates quarantine merge requests automatically, assigned to Engineering Managers based on `feature_category` metadata. See [Flaky Tests: Automated Reporting](./flaky-tests/#automated-reporting-of-top-flaky-test-files) for details.
1. **Automated Flaky Test Reporting** (recommended) - Identifies the most impactful flaky test files and creates quarantine merge requests automatically, assigned to Engineering Managers based on `feature_category` metadata. See [Flaky Tests: Reporting of Top Flaky Test Files](./flaky-tests/#reporting-of-top-flaky-test-files) for details.

2. **Test Failure Issues** (deprecated) - Automatically creates and updates GitLab issues when tests fail in CI pipelines. This system has significant shortcomings compared to the Automated Flaky Test Reporting system. See [comparison of the two systems](./flaky-tests/#what-about-other-flaky-test-reporting-systems) for details.

@@ -295,7 +295,7 @@ For information about flaky test tracking systems, metrics, and dashboards, see

## Related topics

- [Top Flaky Tests: Automated Reporting](./flaky-tests/_index.md#automated-reporting-of-top-flaky-test-files) - How the top flaky tests are automatically detected and reported
- [Top Flaky Tests: Reporting](./flaky-tests/_index.md#reporting-of-top-flaky-test-files) - How the top flaky tests are automatically detected and reported
- [Quarantine improvement initiative epic](https://gitlab.com/groups/gitlab-org/quality/-/epics/259) - Full context on the quarantine improvement project

## Get help