Persist per-project random offsets for scheduled SEP and eliminate `perform_in`
## Summary Introduce a per-project schedule table for Scan Execution Policies (SEP) to persist random `time_window` offsets, then switch from `perform_in` to `perform_async` by baking the delay into `next_run_at`. This eliminates Sidekiq queue bloat from long-lived scheduled jobs and enables future visibility into planned execution times. This follows the work in #586186, which consolidates time-window capping logic (`TimeWindowCappable` concern) and bakes the delay into `next_run_at` for Pipeline Execution Policies (PEP) via !225530 and !225531. ## Problem After iterations 1 and 2 of #586186, PEP no longer uses `perform_in` — the random delay is baked into `next_run_at` on the per-project `security_pipeline_execution_project_schedules` record. However, SEP still relies on `perform_in` with runtime-computed delays because: - SEP has **one schedule record per policy** (`security_orchestration_policy_rule_schedules`), not per project - The fan-out to projects happens at execution time in `RuleScheduleService` and `OrchestrationPolicyRuleScheduleNamespaceWorker` - There is no place to persist a per-project offset This means SEP still suffers from: 1. **Sidekiq queue bloat** — orphaned `perform_in` jobs remain if policies are deleted/modified 2. **No visibility** — we cannot tell customers when a scan will actually run for a given project ## Proposed Solution ### Phase 1: Create per-project schedule table for SEP (database migration) Create `security_scan_execution_project_schedules`: | Column | Type | Description | |---|---|---| | `id` | bigint | PK | | `rule_schedule_id` | bigint | FK → `security_orchestration_policy_rule_schedules(id)` ON DELETE CASCADE | | `project_id` | bigint | FK → `projects(id)` ON DELETE CASCADE | | `next_run_at` | timestamptz | Base cron time + persisted random offset | | `next_run_applied_delay` | integer | The persisted random offset in seconds | | `created_at` | timestamptz | | | `updated_at` | timestamptz | | Indexes: - Unique index on `(rule_schedule_id, project_id)` - Index on `project_id` (FK) Key design decisions: - `ON DELETE CASCADE` on `rule_schedule_id` ensures cleanup when `configuration.delete_all_schedules` runs during policy sync - `ON DELETE CASCADE` on `project_id` handles project deletion - No new columns on the existing `security_orchestration_policy_rule_schedules` table — the offset is per-project, not per-policy ### Phase 2: Model + self-healing in `RuleScheduleService` Create `Security::ScanExecutionProjectSchedule` model. Update `RuleScheduleService#schedule_scans_using_a_worker` to find-or-create per-project rows: ```ruby def schedule_scans_using_a_worker(branches, schedule) if (capped_time_window = schedule.effective_time_window) project_schedule = Security::ScanExecutionProjectSchedule .find_or_initialize_by(rule_schedule: schedule, project: project) if project_schedule.new_record? || project_schedule.next_run_applied_delay.nil? delay = Random.rand(capped_time_window) project_schedule.update!( next_run_at: schedule.next_run_at + delay.seconds, next_run_applied_delay: delay ) end branches.map do |branch| CreatePipelineWorker.perform_in( project_schedule.next_run_applied_delay.seconds, project.id, current_user.id, schedule.id, branch ) end else branches.map do |branch| CreatePipelineWorker.perform_async( project.id, current_user.id, schedule.id, branch ) end end end ``` Apply the same pattern in `OrchestrationPolicyRuleScheduleNamespaceWorker` for group-level policies. **Self-healing:** No background migration needed. Per-project rows are created on first execution. Every active schedule self-heals within one cadence cycle. ### Phase 3: Switch SEP to `perform_async` Modify `OrchestrationPolicyRuleScheduleWorker` to query `security_scan_execution_project_schedules` for runnable rows: ```ruby def schedule_rules(schedule) # ... existing validation ... schedule.schedule_next_run! Security::ScanExecutionProjectSchedule .for_rule_schedule(schedule) .where("next_run_at < ?", Time.zone.now) .find_each do |project_schedule| project = project_schedule.project user = project.security_policy_bot next unless user Security::ScanExecutionPolicies::RuleScheduleWorker.perform_async( project.id, user.id, schedule.id ) # Re-roll offset for next cadence recalculate_next_run!(project_schedule, schedule) end end ``` `RuleScheduleService` no longer computes delays — it runs the scan immediately since it's called at the right time. **Fallback for stragglers:** If a project has no row yet, the worker creates one with a random offset and skips execution for this tick. The project is picked up on the next cron tick at its staggered time. ## MR Breakdown | MR | Content | Depends on | |---|---|---| | MR 1 | Database migration: create `security_scan_execution_project_schedules` table | !225530 | | MR 2 | `ScanExecutionProjectSchedule` model + self-healing in `RuleScheduleService` and namespace worker | MR 1 | | MR 3 | Switch cron worker to query per-project table, use `perform_async`, remove `perform_in` | MR 2 | ## Acceptance Criteria - [ ] `security_scan_execution_project_schedules` table created with proper FKs and indexes - [ ] `Security::ScanExecutionProjectSchedule` model implemented - [ ] `RuleScheduleService` creates per-project rows on first execution (self-healing) - [ ] `OrchestrationPolicyRuleScheduleNamespaceWorker` creates per-project rows for group-level policies - [ ] Cron worker queries per-project table for runnable schedules - [ ] `perform_in` replaced with `perform_async` for SEP - [ ] Per-project offsets re-rolled on each cadence tick - [ ] Fallback path for projects without per-project rows - [ ] Comprehensive test coverage ## Future Benefits - Enables a **planned execution dashboard** showing customers when scans will run per project - Enables **pluggable distribution strategies** (batching, sequential) by making the offset calculation swappable - Unifies the SEP and PEP per-project scheduling pattern ## Related - Parent issue: #586186 - SEP capping + shared concern: !225530 - PEP baked delay: !225531 - Documentation FAQ: !225525 - Original PoC: !216217
issue