Fix handling of DAGs within a stage in AtomicProcessingService
What does this MR do and why?
Suppose you had a .gitlab-ci.yml
with a manual job listed after a
needs
:
test1:
needs: [manual1]
script: exit 0
manual1:
stage: test
when: manual
script: exit 0
Previously the test1
job would be stuck in the created
state
instead of skipped
because AtomicProcessingService
did not
consider a DAG within a stage. test1
only gets updated to skipped
if Ci::ProcessBuildService
is called on manual1
first, but since
test1
is listed first AtomicProcessingService
will usually call
Ci::ProcessBuildService
on it first. The update order previously
depended on the order returned by the database in StatusCollection#all_jobs
. (https://gitlab.com/gitlab-org/gitlab/-/blob/3d7a5af7ce7b18c32c0a1e893f05f9c7bc6b4b92/app/services/ci/pipeline_processing/atomic_processing_service/status_collection.rb#L109-125).
This commit introduces a ci_atomic_processing_ordered_update_stage
feature flag that performs a topological sort on all jobs in the stage and updates the jobs in that order. This is similar to the approach taken in ResetSkippedJobsService
.
Relates to #450395 (closed)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
- Enable the
ci_atomic_processing_ordered_update_stage
feature flag. Inbin/rails console
run,Feature.enable(:ci_atomic_processing_ordered_update_stage)
- On this branch, create a
.gitlab-ci.yml
:
manual1:
stage: test
when: manual
script: exit 0
test1:
stage: test
needs: [manual1]
script: exit 0
-
Run the pipeline and see that it is skipped.
-
Now switch the order:
test1:
stage: test
needs: [manual1]
script: exit 0
manual1:
stage: test
when: manual
script: exit 0
-
The pipeline should also be skipped.
-
For extra credit, create a really large DAG with this script:
jobs = []
200.times do |index|
jobs << { name: "test#{index}", needs: "test#{index+1}" }
end
jobs << { name: "test200", when: "manual" }
jobs.each do |job|
puts "#{job[:name]}:"
puts " needs: [#{job[:needs]}]" if job[:needs]
puts " when: #{job[:when]}" if job[:when]
puts " script: exit 0"
end
-
Use that script in an actual CI job.
-
In addition, you can apply this diff to read
/tmp/dag.yml
to ensure that the SQL queries look similar inlog/test.log
and to profile the memory usage in/tmp/profile.txt
:
diff --git a/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb b/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb
index 81f21ed70531..c2ca5d62d0d8 100644
--- a/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb
+++ b/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb
@@ -928,6 +928,28 @@ def event_on_pipeline(event)
end
end
+ context 'with large DAG' do
+ let(:config) { File.read('/tmp/dag.yml') }
+
+ let(:pipeline) do
+ Ci::CreatePipelineService.new(project, user, { ref: 'master' }).execute(:push).payload
+ end
+
+ before do
+ stub_feature_flags(ci_atomic_processing_ordered_update_stage: true)
+ stub_ci_pipeline_yaml_file(config)
+ Rails.logger.info "=== before process"
+ require 'memory_profiler'
+ report = MemoryProfiler.report { process_pipeline }
+ Rails.logger.info "=== after process"
+ output = File.open('/tmp/profile.txt','w')
+ report.pretty_print(output)
+ end
+
+ it 'does something' do
+ expect(all_builds_names_and_statuses).to eq({})
+ end
+ end
+
context 'when dependent jobs are listed after job needs' do
let(:feature_flag_state) { true }
let(:config) do
The memory profile shows we allocate a lot of RAM logging queries and backtraces!