Eliminate SET On-call: Phase 2: Live Environment Pipeline Frequency Reduction
# Summary
This epic tracks Phase 2 of the effort to eliminate the current [Test Platform On-call DRI process](https://handbook.gitlab.com/handbook/engineering/infrastructure/test-platform/oncall-rotation/).
Please see the main issue [Eliminate SET On-call Through Tooling Improvement and Developer Enablement](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/2543) for more details
## Phase 2: Live Environment Pipeline Frequency Reduction [ ~"FY25::Q4" - ~"FY26::Q1" ]
* Current State:
* Multiple environment validations (staging/staging-ref/production). if the stability of the production environment is solely being validated through E2E tests, it suggests a gap in our observability and monitoring systems. Specifically, it indicates that we may not be effectively capturing production health and performance metrics through standard SRE processes.
* High environmental flakiness
* Redundant coverage with `test-on-cng` that runs in MR pipelines
* Implementation Path:
* :white_check_mark: [Remove Staging Ref E2E Test Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3208)
* :clock3: [Reduce Production and Production Canary E2E Test Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3302)
* :clock3: Monitor impact via [Analyze Impact on Issues and Incidents After Reducing Production E2E Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3282)
* :pencil: [Reduce Staging and Staging Canary E2E Test Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3332)
* :pencil: Monitor impact via [Analyze Impact on Issues and Incidents After Reducing Staging E2E Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3333)
### Business Impact
* **Reduced On-Call Load**: Frees up SETs for strategic work by minimizing active monitoring.
* **Faster Feedback**: Accelerates development with quicker, essential test validations.
* **Increased Stability**: Reduces environmental flakiness for more reliable test outcomes.
* **Cost Savings**: Optimizes resource usage by cutting down on redundant executions.
### Legend
- :white_check_mark: Completed
- :clock3: In Progress
- :pencil: Pending/Planning
<!-- STATUS NOTE START -->
## Status 2025-04-30
All child items are closed. This epic is done.
_Copied from https://gitlab.com/groups/gitlab-org/-/epics/16167#note_2476951980_
<!-- STATUS NOTE END -->
epic