Add partition pruning to PipelineProcessWorker
What does this MR do and why?
PipelineProcessWorker queries p_ci_pipelines using only pipeline_id, with no partition_id constraint. This causes a full cross-partition scan on every invocation, which was identified as a primary contributor to LockManager LWLock saturation on CI replicas (see incident review). As the number of CI partitions grows, this problem compounds.
This MR adds partition pruning to PipelineProcessWorker by scoping the pipeline lookup to the current partition first, then falling back to the unscoped query if the pipeline is not found in the current partition. This allows PostgreSQL to prune irrelevant partitions and significantly reduces the query cost for the common case where the pipeline belongs to the current (most recent) partition.
The change is gated behind the ci_partition_pruning_workers feature flag (gitlab_com_derisk, default off) so it can be safely enabled and rolled back on GitLab.com.
Closes #593701.
References
- Issue: #593701
- Feature flag rollout issue: #593744
- Incident review: https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21534#note_3158996366
- CI partitioning design doc: https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_data_decay/pipeline_partitioning/
Screenshots or screen recordings
N/A — backend-only change
How to set up and validate locally
- Enable the feature flag:
Feature.enable(:ci_partition_pruning_workers) - Create a pipeline and enqueue
PipelineProcessWorker.new.perform(pipeline.id) - Observe that the generated SQL includes a
partition_idconstraint scoped to the current partition - To test the fallback: create a pipeline in a non-current partition and confirm it is still processed correctly via the unscoped query
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.