Batch database updates for cancel_build to improve performance at scale
Problem
Currently, Ci::CancelPipelineService#cancel_jobs processes job cancellations individually using the state machine event. For pipelines with many jobs (e.g., 1000+ cancelable jobs), this approach could be optimized by batching database updates instead of processing each job separately.
Background
Related to !199937 (merged) and #382065 (closed)
In the current implementation, we iterate through jobs and call cancel_job on each one individually, which triggers the state machine event. While this ensures proper state transitions, it's not optimal for large-scale pipeline cancellations.
Related Discussion
From !199937 (merged):
I think this is possible, it's not doing that much so we probably could do a batch database update, but that would mean we would need to not use the state machine event.
@fabiopitino raised: "Can we make cancel_jobs more resilient at scale? Imagine that a pipeline has 1000 cancelable jobs."
Proposal
Investigate and implement batch database updates for job cancellation to improve performance when canceling pipelines with many jobs. This would likely require:
- Not using the state machine event for individual jobs
- Batch updating job statuses directly in the database
- Ensuring all necessary side effects of the state machine are still handled correctly
- Maintaining data consistency and proper state transitions
Rough idea:
def target_cancel_status_for(job)
# TODO: Add preloads so these won't trigger an N+1
if job.running? && job.supports_canceling?
'canceling'
else
'canceled'
end
end
def batch_cancel_jobs(jobs)
jobs_by_status = jobs_to_cancel.group_by { |job| target_cancel_status_for(job) }
jobs_by_status.each do |target_status, job_group|
job_ids = job_group.map(&:id)
CommitStatus.where(id: job_ids).update_all(
status: target_status,
finished_at: Time.current,
updated_at: Time.current
)
# Handle necessary side effects that the state machine would normally trigger
# (e.g., cleanup, notifications, etc.)
end
end
Benefits
- Improved performance for canceling large pipelines
- Reduced database load when processing many job cancellations
- More resilient cancellation process at scale