Eliminate SET On-call: Phase 2: Live Environment Pipeline Frequency Reduction
# Summary This epic tracks Phase 2 of the effort to eliminate the current [Test Platform On-call DRI process](https://handbook.gitlab.com/handbook/engineering/infrastructure/test-platform/oncall-rotation/). Please see the main issue [Eliminate SET On-call Through Tooling Improvement and Developer Enablement](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/2543) for more details ## Phase 2: Live Environment Pipeline Frequency Reduction [ ~"FY25::Q4" - ~"FY26::Q1" ] * Current State: * Multiple environment validations (staging/staging-ref/production). if the stability of the production environment is solely being validated through E2E tests, it suggests a gap in our observability and monitoring systems. Specifically, it indicates that we may not be effectively capturing production health and performance metrics through standard SRE processes. * High environmental flakiness * Redundant coverage with `test-on-cng` that runs in MR pipelines * Implementation Path: * :white_check_mark: [Remove Staging Ref E2E Test Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3208) * :clock3: [Reduce Production and Production Canary E2E Test Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3302) * :clock3: Monitor impact via [Analyze Impact on Issues and Incidents After Reducing Production E2E Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3282) * :pencil: [Reduce Staging and Staging Canary E2E Test Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3332) * :pencil: Monitor impact via [Analyze Impact on Issues and Incidents After Reducing Staging E2E Pipelines](https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3333) ### Business Impact * **Reduced On-Call Load**: Frees up SETs for strategic work by minimizing active monitoring. * **Faster Feedback**: Accelerates development with quicker, essential test validations. * **Increased Stability**: Reduces environmental flakiness for more reliable test outcomes. * **Cost Savings**: Optimizes resource usage by cutting down on redundant executions. ### Legend - :white_check_mark: Completed - :clock3: In Progress - :pencil: Pending/Planning <!-- STATUS NOTE START --> ## Status 2025-04-30 All child items are closed. This epic is done. _Copied from https://gitlab.com/groups/gitlab-org/-/epics/16167#note_2476951980_ <!-- STATUS NOTE END -->
epic