Spike: Scheduled PEP test-run feature
Problem Statement
Customers are hesitant to enable Scheduled Pipeline Execution Policies (SPEP) due to lack of visibility into the potential impact. They need to understand:
- How many pipelines will be triggered at once
- Estimated execution duration for all pipelines
- Estimated distribution time window
- Resource utilization impact on their infrastructure
This idea came up in this discussion: &17875 (comment 2597450847)
Proposed Solution Overview
Implement a test-run feature (renamed from "dry-run" to better reflect that pipelines will actually execute and may cause system changes) that allows users to:
- Test Pipeline Execution: Select a single project to execute a Scheduled Pipeline Execution Policy and measure actual pipeline duration
- Store Performance Data: Capture and store pipeline execution metrics for impact assessment
- Historical Data Collection: Aggregate duration data from both test-runs and actual scheduled executions to improve accuracy over time
- Impact Assessment: Provide resource utilization estimates and total execution time projections
Technical Requirements
API Implementation (Backend Only)
- Single Project Selection: Initial implementation supports testing on one project at a time
- Project Owner Permission: Only project owners can initiate test-runs for their projects
- Rate Limiting: Enforce maximum of one test-run per policy per hour to prevent resource abuse
Data Storage & Management
- Store test-run results associated with the Scheduled Pipeline Execution Policy configuration
- Capture pipeline execution duration, success/failure status, and resource metrics
- Collect historical duration data from both test-runs and actual scheduled pipeline executions
- Maintain data association between policy configuration and performance metrics
Impact Assessment Calculations
-
Resource Utilization Estimates: Calculate expected infrastructure load based on:
- Number of projects in scope
- Average pipeline duration from test-runs/historical data
- Configured time window distribution
- Total Estimated Execution Time: Project total time needed for all pipelines to complete
- Distribution Timeline: Show estimated pipeline execution distribution over the configured time window
Performance Data Collection
- Store pipeline duration from test-runs
- Aggregate historical data from actual scheduled pipeline executions
- Improve accuracy of estimates as more data is collected
- Track success/failure rates for reliability assessment
Implementation Phases
Phase 1: Core Test-Run Functionality
- API endpoint to trigger test-run on selected project
- Execute SPEP pipeline on chosen project and capture duration
- Store test-run results with policy association
- Basic impact calculation based on single test-run data
Phase 2: Historical Data Integration
- Collect duration data from actual scheduled pipeline executions
- Aggregate historical performance metrics
- Enhanced impact assessment using combined test-run and historical data
- Improved accuracy for resource utilization estimates
Success Criteria
- Users can execute test-runs on selected projects to validate SPEP configuration
- Pipeline execution duration is accurately captured and stored
- Impact assessment provides meaningful resource utilization estimates
- Rate limiting prevents system abuse
- Historical data collection improves estimate accuracy over time
- Foundation established for future UI integration
Related Issues
Notes
- This is a backend-only spike focusing on API implementation
- UI integration will be addressed in future iterations
- Test-runs execute actual pipelines (not simulations) to provide accurate timing data
- Feature serves dual purpose: configuration validation and impact assessment
Edited by Alan (Maciej) Paruszewski