Phase 1: Reduce Functional test flakiness by updating Quarantined Test Process (#195) · Epics · Epics · Developer Experience

Phase 1: Reduce Functional test flakiness by updating Quarantined Test Process

## Part of https://gitlab.com/groups/gitlab-org/quality/-/epics/259+ The overall goal of this epic reduce flakiness in pipelines by ensuring we have a clear, efficient process for removing tests failing tests with a known/acceptable cause from the pipeline. ## Main areas of improvement, split into three sub-epics: * [Quarantine Process and documentation](https://gitlab.com/groups/gitlab-org/quality/-/epics/258) - Establishes clear, standardized procedures for the entire test quarantine lifecycle, from detection through removal. When teams understand their responsibilities and follow consistent processes, flaky tests get resolved faster rather than accumulating. * [Address Fast Quarantine Violations and Streamline process ](https://gitlab.com/groups/gitlab-org/quality/-/epics/257)-\> Tackles the fast quarantine abuse problem where 32 tests are currently fast-quarantined. Fast quarantine is meant for immediate, temporary relief, not permanent solutions. This reduces the risk of stable branch failures when fast-quarantined tests are removed without proper backporting, and ensures flaky tests don't silently break pipelines when the fast quarantine expires. * [Test feature_category violations](https://gitlab.com/groups/gitlab-org/quality/-/epics/275) -\> Addresses the accountability gap where 38 quarantined tests have no owner and 5 have shared ownership. Without clear ownership, flaky tests never get fixed—they just accumulate. With 400+ tests having shared ownership and 3,600+ files with feature category violations, there's massive diffusion of responsibility. Note: Additional issues not in the above three workstreams are assigned to the top-level epic. ## Things to keep in mind: - Treat every type of test in the same way (look into using the same tools or align the way they deal with tests and later on refactor to use the same tools) - ~~Treat tests from different programming languages/frameworks in the same way~~ phase 2 <details> <summary> ## Baseline Metrics </summary> ### Quarantine Growth Analysis | Test Level | 1 Year Ago (2024-01) | Current (2025-01) | 1 year % growth | |------------|----------------------|-------------------|-----------------| | **Unit Tests** (spec/models, spec/services, spec/lib, other) | **27** | **183** | **+578%** | | **Integration/System Tests** (spec/features, spec/requests) | **131** | **205** | **+56%** | | **E2E Tests** (qa/) | **61** | **92** | **+51%** | | **TOTAL** | **219** | **480** | **+119%** | **Important Note**: These statistics only count individual `:quarantine` tags. Multiple tests can be quarantined with a single tag, so the actual number of quarantined tests may be significantly higher than reported. ### Fast Quarantine Growth | Metric | 2 Years Ago (first implemented) | 1 Year Ago | Current (2025-01) | Year 1 Growth | Year 2 Growth | Concern | |--------|---------------------------------|------------|-------------------|---------------|---------------|---------| | **Fast Quarantined Tests** | 0 | 18 | **32** | +18 (∞%) | +14 (78%) | Tests left on fast quarantine indefinitely, backport issues when removing | ### Key Observations - **Total quarantined tests more than doubled** in one year (119% growth) - **Unit tests increased by 578%** - **42.7%** of quarantined tests are Integration/System tests (features + requests) - **38.1%** are Unit tests - **19.2%** are E2E tests - **Fast quarantine misuse accelerating**: From 0 to 18 tests in first year, then 78% growth to 32 tests in second year - Actual test count likely higher due to multiple tests per quarantine tag ### Additional Baseline Metrics to Track | Metric | Current Value | Notes | |--------|---------------|-------| | Total quarantine tags | **480** | Up from 219 one year ago (119% growth) | | Actual quarantined tests | _\[TBC - AUDIT NEEDED\]_ | Multiple tests can share single tag | | Tests in fast quarantine \>1 week | **32** | Increased from 0→18→32 over 2 years | | Teams with \>n quarantined tests | _\[TBC - AUDIT NEEDED\]_ | Potential threshold for pre-milestone alerts | </details>  ### Participants - @jay_mccure - DRI - @willmeek - @dchevalier2 - @tim_beauchamp - @treagitlab ## Epic Closure: Phase 1 - Reduce Functional Test Flakiness by Updating Quarantined Test Process #### The Original Problem GitLab's test quarantine system was experiencing unsustainable growth and operational challenges: * **Fast quarantine over-use**: 32+ tests were indefinitely fast-quarantined, creating risk of stable branch failures when removed without proper backporting * **Accountability gap**: 38 quarantined tests had no owner, 5 had shared ownership, and 720 spec files had invalid feature_categories making assignment of quarantine MRs and test-failure issues ineffective * **Documentation fragmentation**: Quarantine process was scattered across multiple locations with inconsistent guidance. New documentation and communication of `top-flaky-test` process needed. Without clear ownership and standardized processes, flaky tests were accumulating rather than being fixed or quarantined, creating technical debt that blocked engineers from shipping features. #### Changes Made This initiative delivered improvements across three major workstreams: 1. [**Address Fast Quarantine Violations and Streamline process**](https://gitlab.com/groups/gitlab-org/quality/-/work_items/257) * Automated reminder system for engineers to take action on fast quarantine MRs * Weekly cleanup job that forces permanent quarantine or test fixes, resetting fast quarantine to 0 each week * Comprehensive audit reducing violations from 32+ tests * Updated documentation and engineering communications * **Forced teams to confront failing tests** rather than ignoring them and then becoming a problem in stable branches ([example discussion](https://gitlab.slack.com/archives/CJZR6KPB4/p1769737503715499)) 2. [**Update Quarantine Process description and related documentation**](https://gitlab.com/groups/gitlab-org/quality/-/work_items/258) * Completely overhauled [quarantine process handbook](https://handbook.gitlab.com/handbook/engineering/testing/quarantine-process/) as single source of truth * Documented full quarantine lifecycle including when and how to quarantine tests * Established clear decision criteria for quarantine types * Guided teams to leverage automated top flaky test reporting for quarantine decisions * Aligned timelines across handbook, dev docs, issue template, and auto quarantine MR template 3. [**Test Ownership - Address tests with feature_category violations**](https://gitlab.com/groups/gitlab-org/quality/-/work_items/275) * **Prevented :shared category growth**: Rubocop validation prevents new tests from using `:shared` category * **Resolved 17 invalid categories** by recognizing maintained_categories, fixing typos, clarifying ownership with product groups, and gracefully handling obsolete categories * **Reduced spec files with unknown categories by 78%**: from 720 → 160 spec files (actual test count likely much higher as one spec file can contain many tests) * **Resolved quarantined tests without ownership**: \[MISSING\] feature_category for quarantined tests reduced from 51 → 17 (67% reduction) * **Leveraged GitLab Duo to assign team ownership at scale**: Used LLM-assisted analysis to categorize 409 files with `:shared` ownership, distinguishing between "lazy shared" (should be owned by specific teams) vs "genuine shared" (platform concerns). Created 47 MRs that successfully assigned 335 spec files to appropriate teams (actual test count likely much higher), with the remainder identified as legitimately shared infrastructure. * **Eliminated 37 rubocop infractions** from `.rubocop_todo/rspec/feature_category.yml`, reducing technical debt **Additional Improvements:** * Migrated 731 quarantine issue links from the main gitlab project to test-failure-issues project (https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/work_items/3976+) * Deprecated broken E2E quarantine slack report (https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/work_items/3816+) * Communicated Top Flaky Test process to engineering team (https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/work_items/4107+) via slack message and engineering week-in-review #### Shout-outs :raised_hands: Huge thanks to all contributors: @willmeek, @dchevalier2, @tim_beauchamp, @treagitlab, @chloeliu and to the awesome ~"group::development analytics" team. ## Status  ## Status 2026-02-04 :clock1: **total hours spent this week by all contributors**: 16 [Epic closure message](https://gitlab.com/groups/gitlab-org/quality/-/epics/195#note_3059938610) as it did not display correctly in parent epic. Epic can be closed, thank you! 🙏 _Copied from https://gitlab.com/groups/gitlab-org/quality/-/epics/195#note_3056325085_

epic