Persist per-project random offsets for scheduled SEP and eliminate `perform_in`
## Summary
Introduce a per-project schedule table for Scan Execution Policies (SEP) to persist random `time_window` offsets, then switch from `perform_in` to `perform_async` by baking the delay into `next_run_at`. This eliminates Sidekiq queue bloat from long-lived scheduled jobs and enables future visibility into planned execution times.
This follows the work in #586186, which consolidates time-window capping logic (`TimeWindowCappable` concern) and bakes the delay into `next_run_at` for Pipeline Execution Policies (PEP) via !225530 and !225531.
## Problem
After iterations 1 and 2 of #586186, PEP no longer uses `perform_in` — the random delay is baked into `next_run_at` on the per-project `security_pipeline_execution_project_schedules` record. However, SEP still relies on `perform_in` with runtime-computed delays because:
- SEP has **one schedule record per policy** (`security_orchestration_policy_rule_schedules`), not per project
- The fan-out to projects happens at execution time in `RuleScheduleService` and `OrchestrationPolicyRuleScheduleNamespaceWorker`
- There is no place to persist a per-project offset
This means SEP still suffers from:
1. **Sidekiq queue bloat** — orphaned `perform_in` jobs remain if policies are deleted/modified
2. **No visibility** — we cannot tell customers when a scan will actually run for a given project
## Proposed Solution
### Phase 1: Create per-project schedule table for SEP (database migration)
Create `security_scan_execution_project_schedules`:
| Column | Type | Description |
|---|---|---|
| `id` | bigint | PK |
| `rule_schedule_id` | bigint | FK → `security_orchestration_policy_rule_schedules(id)` ON DELETE CASCADE |
| `project_id` | bigint | FK → `projects(id)` ON DELETE CASCADE |
| `next_run_at` | timestamptz | Base cron time + persisted random offset |
| `next_run_applied_delay` | integer | The persisted random offset in seconds |
| `created_at` | timestamptz | |
| `updated_at` | timestamptz | |
Indexes:
- Unique index on `(rule_schedule_id, project_id)`
- Index on `project_id` (FK)
Key design decisions:
- `ON DELETE CASCADE` on `rule_schedule_id` ensures cleanup when `configuration.delete_all_schedules` runs during policy sync
- `ON DELETE CASCADE` on `project_id` handles project deletion
- No new columns on the existing `security_orchestration_policy_rule_schedules` table — the offset is per-project, not per-policy
### Phase 2: Model + self-healing in `RuleScheduleService`
Create `Security::ScanExecutionProjectSchedule` model.
Update `RuleScheduleService#schedule_scans_using_a_worker` to find-or-create per-project rows:
```ruby
def schedule_scans_using_a_worker(branches, schedule)
if (capped_time_window = schedule.effective_time_window)
project_schedule = Security::ScanExecutionProjectSchedule
.find_or_initialize_by(rule_schedule: schedule, project: project)
if project_schedule.new_record? || project_schedule.next_run_applied_delay.nil?
delay = Random.rand(capped_time_window)
project_schedule.update!(
next_run_at: schedule.next_run_at + delay.seconds,
next_run_applied_delay: delay
)
end
branches.map do |branch|
CreatePipelineWorker.perform_in(
project_schedule.next_run_applied_delay.seconds,
project.id, current_user.id, schedule.id, branch
)
end
else
branches.map do |branch|
CreatePipelineWorker.perform_async(
project.id, current_user.id, schedule.id, branch
)
end
end
end
```
Apply the same pattern in `OrchestrationPolicyRuleScheduleNamespaceWorker` for group-level policies.
**Self-healing:** No background migration needed. Per-project rows are created on first execution. Every active schedule self-heals within one cadence cycle.
### Phase 3: Switch SEP to `perform_async`
Modify `OrchestrationPolicyRuleScheduleWorker` to query `security_scan_execution_project_schedules` for runnable rows:
```ruby
def schedule_rules(schedule)
# ... existing validation ...
schedule.schedule_next_run!
Security::ScanExecutionProjectSchedule
.for_rule_schedule(schedule)
.where("next_run_at < ?", Time.zone.now)
.find_each do |project_schedule|
project = project_schedule.project
user = project.security_policy_bot
next unless user
Security::ScanExecutionPolicies::RuleScheduleWorker.perform_async(
project.id, user.id, schedule.id
)
# Re-roll offset for next cadence
recalculate_next_run!(project_schedule, schedule)
end
end
```
`RuleScheduleService` no longer computes delays — it runs the scan immediately since it's called at the right time.
**Fallback for stragglers:** If a project has no row yet, the worker creates one with a random offset and skips execution for this tick. The project is picked up on the next cron tick at its staggered time.
## MR Breakdown
| MR | Content | Depends on |
|---|---|---|
| MR 1 | Database migration: create `security_scan_execution_project_schedules` table | !225530 |
| MR 2 | `ScanExecutionProjectSchedule` model + self-healing in `RuleScheduleService` and namespace worker | MR 1 |
| MR 3 | Switch cron worker to query per-project table, use `perform_async`, remove `perform_in` | MR 2 |
## Acceptance Criteria
- [ ] `security_scan_execution_project_schedules` table created with proper FKs and indexes
- [ ] `Security::ScanExecutionProjectSchedule` model implemented
- [ ] `RuleScheduleService` creates per-project rows on first execution (self-healing)
- [ ] `OrchestrationPolicyRuleScheduleNamespaceWorker` creates per-project rows for group-level policies
- [ ] Cron worker queries per-project table for runnable schedules
- [ ] `perform_in` replaced with `perform_async` for SEP
- [ ] Per-project offsets re-rolled on each cadence tick
- [ ] Fallback path for projects without per-project rows
- [ ] Comprehensive test coverage
## Future Benefits
- Enables a **planned execution dashboard** showing customers when scans will run per project
- Enables **pluggable distribution strategies** (batching, sequential) by making the offset calculation swappable
- Unifies the SEP and PEP per-project scheduling pattern
## Related
- Parent issue: #586186
- SEP capping + shared concern: !225530
- PEP baked delay: !225531
- Documentation FAQ: !225525
- Original PoC: !216217
issue