Phase 1: Reduce Functional test flakiness by updating Quarantined Test Process
## Part of https://gitlab.com/groups/gitlab-org/quality/-/epics/259+
The overall goal of this epic reduce flakiness in pipelines by ensuring we have a clear, efficient process for removing tests failing tests with a known/acceptable cause from the pipeline.
## Main areas of improvement, split into three sub-epics:
* [Quarantine Process and documentation](https://gitlab.com/groups/gitlab-org/quality/-/epics/258) - Establishes clear, standardized procedures for the entire test quarantine lifecycle, from detection through removal. When teams understand their responsibilities and follow consistent processes, flaky tests get resolved faster rather than accumulating.
* [Address Fast Quarantine Violations and Streamline process ](https://gitlab.com/groups/gitlab-org/quality/-/epics/257)-\> Tackles the fast quarantine abuse problem where 32 tests are currently fast-quarantined. Fast quarantine is meant for immediate, temporary relief, not permanent solutions. This reduces the risk of stable branch failures when fast-quarantined tests are removed without proper backporting, and ensures flaky tests don't silently break pipelines when the fast quarantine expires.
* [Test feature_category violations](https://gitlab.com/groups/gitlab-org/quality/-/epics/275) -\> Addresses the accountability gap where 38 quarantined tests have no owner and 5 have shared ownership. Without clear ownership, flaky tests never get fixed—they just accumulate. With 400+ tests having shared ownership and 3,600+ files with feature category violations, there's massive diffusion of responsibility.
Note: Additional issues not in the above three workstreams are assigned to the top-level epic.
## Things to keep in mind:
- Treat every type of test in the same way (look into using the same tools or align the way they deal with tests and later on refactor to use the same tools)
- ~~Treat tests from different programming languages/frameworks in the same way~~ phase 2
<details>
<summary>
## Baseline Metrics
</summary>
### Quarantine Growth Analysis
| Test Level | 1 Year Ago (2024-01) | Current (2025-01) | 1 year % growth |
|------------|----------------------|-------------------|-----------------|
| **Unit Tests** (spec/models, spec/services, spec/lib, other) | **27** | **183** | **+578%** |
| **Integration/System Tests** (spec/features, spec/requests) | **131** | **205** | **+56%** |
| **E2E Tests** (qa/) | **61** | **92** | **+51%** |
| **TOTAL** | **219** | **480** | **+119%** |
**Important Note**: These statistics only count individual `:quarantine` tags. Multiple tests can be quarantined with a single tag, so the actual number of quarantined tests may be significantly higher than reported.
### Fast Quarantine Growth
| Metric | 2 Years Ago (first implemented) | 1 Year Ago | Current (2025-01) | Year 1 Growth | Year 2 Growth | Concern |
|--------|---------------------------------|------------|-------------------|---------------|---------------|---------|
| **Fast Quarantined Tests** | 0 | 18 | **32** | +18 (∞%) | +14 (78%) | Tests left on fast quarantine indefinitely, backport issues when removing |
### Key Observations
- **Total quarantined tests more than doubled** in one year (119% growth)
- **Unit tests increased by 578%**
- **42.7%** of quarantined tests are Integration/System tests (features + requests)
- **38.1%** are Unit tests
- **19.2%** are E2E tests
- **Fast quarantine misuse accelerating**: From 0 to 18 tests in first year, then 78% growth to 32 tests in second year
- Actual test count likely higher due to multiple tests per quarantine tag
### Additional Baseline Metrics to Track
| Metric | Current Value | Notes |
|--------|---------------|-------|
| Total quarantine tags | **480** | Up from 219 one year ago (119% growth) |
| Actual quarantined tests | _\[TBC - AUDIT NEEDED\]_ | Multiple tests can share single tag |
| Tests in fast quarantine \>1 week | **32** | Increased from 0→18→32 over 2 years |
| Teams with \>n quarantined tests | _\[TBC - AUDIT NEEDED\]_ | Potential threshold for pre-milestone alerts |
</details>
<!---
### Implementation Plan TODO: to be reworked with the team
<details>
<summary>
#### Address Fast Quarantine Violations and Streamline process
</summary>
- Audit 32 tests exceeding 1-week fast quarantine limit
- One off effort converting all to permanent quarantine
- Notify teams and request action plans (fix/delete/convert)
- Implement automated enforcement MR/issue assigned to product team
- Target: Maintain \<10 fast quarantined tests ongoing
</details>
<details>
<summary>
#### Enable Team Ownership
</summary>
- Define ownership model for all test levels
- Create short training video explaining the quarantine process
- Create handbook documentation with fix vs. delete decision trees
- Feature no longer exists or is deprecated
- Test provides redundant coverage
- Cost of fixing exceeds value of test coverage
- Test covers edge cases no longer relevant to users
- Link to flaky test documentation
- Add documentation links in quarantine triage report
</details>
<details>
<summary>
#### Future Phase: Implement Proactive Notifications of Quarantined tests
</summary>
- Roll out EM notification system based on Phase 1 feedback:
- Pre-milestone planning reports for teams above thresholds
- Direct links to quarantined tests
- Link to Quality Insights dashboards
- Deploy unified alerting system with configurable thresholds. (Monthly milestone triage report)
</details>
<details>
<summary>
#### Future Phase: Monitor and Adjust
</summary>
- Monthly progress reviews against targets
- 6-month target review (480→432)
- 12-month target assessment (\<400)
- Adjust thresholds and SLAs based on effectiveness
- Leverage Feature Readiness dashboards to showcase metrics
- Provide FY26 Product Quality Standup updates with relevant Quarantine information
- Provide additional support to teams missing targets
- Final reporting and lessons learned
</details>
## Exit Criteria TODO: to be reworked with the team
- [ ] **Audit completed**: Actual test count vs. tag count documented
- [ ] Consult Test Governance team for feedback and ideas
- [ ] **E2E quarantine reporting slack notification fixed oe removed** - reports function without breaking due to volume (Issue https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3816 resolved)
- [ ] **Engineering Manager notification system implemented**:
- [ ] Consultation with EMs completed and requirements gathered
- [ ] **Pre-milestone planning reports automated for teams above thresholds**
- [ ] Automated reminder system implemented for fast quarantine tests (1-week threshold) with team specific escalation/auto-MRs assigned to teams
- [ ] **Training and documentation completed**:
- [ ] Comprehensive training video created and distributed
- [ ] Process documentation updated with flowcharts and decision trees
- [ ] **Clear guidance on when to fix tests, and when to delete tests**
- [ ] Single investigation guide covering all test levels
## Related Epics and Issues
- [Epic &214: Make quarantine rates available for teams](https://gitlab.com/groups/gitlab-org/quality/-/epics/214) - Will provide visibility dashboards for this initiative to leverage
- [Epic &212: Introduce quarantined test metric](https://gitlab.com/groups/gitlab-org/quality/-/epics/212) - Will create metrics that this epic will use to showcase data to engineering teams
- [Issue #3816: E2E quarantine report exceeding character limits](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3816) - Evidence of unsustainable quarantine growth
- [Snippet #4882239: Script for quarantine statistics](https://gitlab.com/-/snippets/4882239) - Script used to gather baseline quarantine data for this epic
## Dependencies
- **Engineering Management for consultation, adoption, and accountability**
- For creating effective notification system
- Essential for driving team-level reductions
- Developer Analytics team for statistics infrastructure
- Quality Insights for dashboards
- **Updates to gitlab quality test tooling** - to align test level reporting
## Resources and Tools
- [**Quarantine statistics script**](https://gitlab.com/-/snippets/4882239) - Script for gathering quarantine metrics and tracking growth patterns across all test levels
- Dashboards from Epic &212 and &214 (once available)
- FY26 Product Quality Standup for regular metric reviews
---
## Notes
- **Key success factors**:
- All test levels should be managed through the same system quarantine system, rather than separate tools or processes.
- **Strategic timing of EM notifications before milestone planning** - ensures quarantine reduction gets proper resource allocation
- Engineering Manager engagement is important as they drive team priorities and resource allocation
- **Sustained reduction at 6 and 12 months** demonstrates process effectiveness
- Consider implementing automated de-quarantine for tests passing consistently
- Focus on making the process proactive rather than reactive
- **Milestone planning integration ensures quarantine reduction becomes planned work, not just reactive fixes**
- Track quarantine velocity (rate of new quarantines vs. removals) per test level to identify systemic issues
- This epic focuses on managing and reducing existing quarantined tests, not on preventing test flakiness
- Issues linked in the codebase in quarantine metadata may be closed (this epic does not plan to use these issues, so it may not be a concern)-->
### Participants
- @jay_mccure - DRI
- @willmeek
- @dchevalier2
- @tim_beauchamp
- @treagitlab
## Epic Closure: Phase 1 - Reduce Functional Test Flakiness by Updating Quarantined Test Process
#### The Original Problem
GitLab's test quarantine system was experiencing unsustainable growth and operational challenges:
* **Fast quarantine over-use**: 32+ tests were indefinitely fast-quarantined, creating risk of stable branch failures when removed without proper backporting
* **Accountability gap**: 38 quarantined tests had no owner, 5 had shared ownership, and 720 spec files had invalid feature_categories making assignment of quarantine MRs and test-failure issues ineffective
* **Documentation fragmentation**: Quarantine process was scattered across multiple locations with inconsistent guidance. New documentation and communication of `top-flaky-test` process needed.
Without clear ownership and standardized processes, flaky tests were accumulating rather than being fixed or quarantined, creating technical debt that blocked engineers from shipping features.
#### Changes Made
This initiative delivered improvements across three major workstreams:
1. [**Address Fast Quarantine Violations and Streamline process**](https://gitlab.com/groups/gitlab-org/quality/-/work_items/257)
* Automated reminder system for engineers to take action on fast quarantine MRs
* Weekly cleanup job that forces permanent quarantine or test fixes, resetting fast quarantine to 0 each week
* Comprehensive audit reducing violations from 32+ tests
* Updated documentation and engineering communications
* **Forced teams to confront failing tests** rather than ignoring them and then becoming a problem in stable branches ([example discussion](https://gitlab.slack.com/archives/CJZR6KPB4/p1769737503715499))
2. [**Update Quarantine Process description and related documentation**](https://gitlab.com/groups/gitlab-org/quality/-/work_items/258)
* Completely overhauled [quarantine process handbook](https://handbook.gitlab.com/handbook/engineering/testing/quarantine-process/) as single source of truth
* Documented full quarantine lifecycle including when and how to quarantine tests
* Established clear decision criteria for quarantine types
* Guided teams to leverage automated top flaky test reporting for quarantine decisions
* Aligned timelines across handbook, dev docs, issue template, and auto quarantine MR template
3. [**Test Ownership - Address tests with feature_category violations**](https://gitlab.com/groups/gitlab-org/quality/-/work_items/275)
* **Prevented :shared category growth**: Rubocop validation prevents new tests from using `:shared` category
* **Resolved 17 invalid categories** by recognizing maintained_categories, fixing typos, clarifying ownership with product groups, and gracefully handling obsolete categories
* **Reduced spec files with unknown categories by 78%**: from 720 → 160 spec files (actual test count likely much higher as one spec file can contain many tests)
* **Resolved quarantined tests without ownership**: \[MISSING\] feature_category for quarantined tests reduced from 51 → 17 (67% reduction)
* **Leveraged GitLab Duo to assign team ownership at scale**: Used LLM-assisted analysis to categorize 409 files with `:shared` ownership, distinguishing between "lazy shared" (should be owned by specific teams) vs "genuine shared" (platform concerns). Created 47 MRs that successfully assigned 335 spec files to appropriate teams (actual test count likely much higher), with the remainder identified as legitimately shared infrastructure.
* **Eliminated 37 rubocop infractions** from `.rubocop_todo/rspec/feature_category.yml`, reducing technical debt
**Additional Improvements:**
* Migrated 731 quarantine issue links from the main gitlab project to test-failure-issues project (https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/work_items/3976+)
* Deprecated broken E2E quarantine slack report (https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/work_items/3816+)
* Communicated Top Flaky Test process to engineering team (https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/work_items/4107+) via slack message and engineering week-in-review
#### Shout-outs :raised_hands:
Huge thanks to all contributors: @willmeek, @dchevalier2, @tim_beauchamp, @treagitlab, @chloeliu and to the awesome ~"group::development analytics" team.
## Status
<!-- STATUS NOTE START -->
## Status 2026-02-04
:clock1: **total hours spent this week by all contributors**: 16
[Epic closure message](https://gitlab.com/groups/gitlab-org/quality/-/epics/195#note_3059938610) as it did not display correctly in parent epic. Epic can be closed, thank you! 🙏
_Copied from https://gitlab.com/groups/gitlab-org/quality/-/epics/195#note_3056325085_
<!-- STATUS NOTE END -->
epic