Throttle scheduled PEP based on running pipelines

With scheduled pipeline execution policies (PEP) it is possible to enforce the start pipelines in each project that the policy applies to based on a schedule. More details in &14147

There is a risk of a backlog of pipelines piling up and overwhelming runnings. For example when a group of 100k projects with a scheduled PEP that runs each day and each pipeline runs for several hours.

We are taking some measures to reduce this risk by:

Limiting the cadence choices to daily, weekly, monthly and so on.
Introducing a time window to not start all jobs at once.
Deduplicating sidekiq jobs that start pipelines.

However, the risk still remains because we have limited control over the runner capacity and pipeline duration.

The problem could be solved by snoozing schedules if a PEP scheduled pipeline is currently running on the project.

diff --git a/ee/app/workers/security/pipeline_execution_policies/run_schedule_worker.rb b/ee/app/workers/security/pipeline_execution_policies/run_schedule_worker.rb
index 26c9a3d18eee..1e42e89398bf 100644
--- a/ee/app/workers/security/pipeline_execution_policies/run_schedule_worker.rb
+++ b/ee/app/workers/security/pipeline_execution_policies/run_schedule_worker.rb
@@ -18,6 +18,15 @@ def perform(schedule_id)
 
         return if Feature.disabled?(:scheduled_pipeline_execution_policies, schedule.project)
 
+        number_of_running_pipelines = schedule.project.all_pipelines.where(source: :pipeline_execution_policy_schedule, status: :running).count
+        number_of_schedules = Security::PipelineExecutionProjectSchedule.where(project_id: schedule.project_id).count
+
+        if number_of_running_pipelines >= number_of_schedules
+          self.class.perform_in(5.miniutes, schedule.id)
+
+          return
+        end
+
         result = execute(schedule)
 
         log_pipeline_creation_failure(result, schedule) if result.error?

A downside of this solution is that it is not visible to the user. Policy owners have no way of telling if there is something wrong with the policy or the schedule was snoozed. I think we should still go for this solution to avoid any incidents but put it behind a feature flag so we can disable it if needed.

This issue is confidential because it points out a weak point in scheduled pipeline execution policies

Edited Feb 26, 2025 by Andy Schoenen