Closing a MR doesn't cancel pipelines that contain child pipelines
Summary
The CI pipeline contains a job for MR results that triggers a child pipeline
.
If MR created and the pipeline is triggered, when Merge train MR is closed, the pipeline continues running.
Steps to reproduce
This bug is flakey and dosn't occur eveytime. It is difficult to reproduce.
.gitlab-ci.yml
:
stages:
- stage_1
- stage_2
- stage_3
job0:
stage: stage_1
script: echo "something"
job1:
stage: stage_2
trigger:
include: 'dir1/service1.yml'
rules:
- if: $CI_MERGE_REQUEST_ID
job2:
stage: stage_3
script: echo "something"
The child pipeline config dir1/service1.yml
:
job1-1:
script:
- echo "something!"
rules:
- when: always
Example Project
https://gitlab.com/gitlab-gold/asalii/ci_commit_ref_name-test
What is the current bug behavior?
Create a new branch, and commit some changes to that branch. Create an MR of that branch into main
.
Close the MR. The pipeline will continue running.
What is the expected correct behavior?
All the running pipelines should stop and be canceled once the MR is closed.
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
- Bring the
retry_lock
closer to the point of update - add
retry_lock
when updatingauto_canceled_by_pipeline_id
- Up the number of retries, currently we retry once
- When we do see an
ActiveRecord::StaleObjectError
in the parents, might we may still want to cancel the children?
Implementation Guide
Implements 1 and 2
diff --git a/app/models/ci/pipeline.rb b/app/models/ci/pipeline.rb
index 1cbe63e596d6..fa7d8e43a257 100644
--- a/app/models/ci/pipeline.rb
+++ b/app/models/ci/pipeline.rb
@@ -668,12 +668,10 @@ def cancel_running(retries: 1, cascade_to_children: true, auto_canceled_by_pipel
end
cancel_jobs(cancelable_statuses, retries: retries, auto_canceled_by_pipeline_id: auto_canceled_by_pipeline_id)
+ cancel_child_pipelines(cascade_to_children, retries, auto_canceled_by_pipeline_id, execute_async)
- if cascade_to_children
- # cancel any bridges that could spin up new child pipelines
- cancel_jobs(bridges_in_self_and_project_descendants.cancelable, retries: retries, auto_canceled_by_pipeline_id: auto_canceled_by_pipeline_id)
- cancel_children(auto_canceled_by_pipeline_id: auto_canceled_by_pipeline_id, execute_async: execute_async)
- end
+ rescue ActiveRecord::StaleObjectError
+ cancel_child_pipelines(cascade_to_children, retries, auto_canceled_by_pipeline_id, execute_async)
end
# rubocop: disable CodeReuse/ServiceClass
@@ -1351,8 +1349,15 @@ def merge_request_diff
private
+ def cancel_child_pipelines(cascade_to_children, retries, auto_canceled_by_pipeline_id, execute_async)
+ return unless cascade_to_children
+
+ # cancel any bridges that could spin up new child pipelines
+ cancel_jobs(bridges_in_self_and_project_descendants.cancelable, retries: retries, auto_canceled_by_pipeline_id: auto_canceled_by_pipeline_id)
+ cancel_children(auto_canceled_by_pipeline_id: auto_canceled_by_pipeline_id, execute_async: execute_async)
+ end
+
def cancel_jobs(jobs, retries: 1, auto_canceled_by_pipeline_id: nil)
- retry_lock(jobs, retries, name: 'ci_pipeline_cancel_running') do |statuses|
preloaded_relations = [:project, :pipeline, :deployment, :taggings]
statuses.find_in_batches do |status_batch|
@@ -1360,8 +1365,10 @@ def cancel_jobs(jobs, retries: 1, auto_canceled_by_pipeline_id: nil)
Preloaders::CommitStatusPreloader.new(relation).execute(preloaded_relations)
relation.each do |job|
- job.auto_canceled_by_id = auto_canceled_by_pipeline_id if auto_canceled_by_pipeline_id
- job.cancel
+ retry_lock(job, retries, name: 'ci_pipeline_cancel_running') do |subject|
+ subject.auto_canceled_by_id = auto_canceled_by_pipeline_id if auto_canceled_by_pipeline_id
+ subject.cancel
+ end
end
end
end