Fix handling of DAGs within a stage in AtomicProcessingService
What does this MR do and why?
Suppose you had a .gitlab-ci.yml with a manual job listed after a
needs:
test1:
needs: [manual1]
script: exit 0
manual1:
stage: test
when: manual
script: exit 0
Previously the test1 job would be stuck in the created state
instead of skipped because AtomicProcessingService did not
consider a DAG within a stage. test1 only gets updated to skipped
if Ci::ProcessBuildService is called on manual1 first, but since
test1 is listed first AtomicProcessingService will usually call
Ci::ProcessBuildService on it first. The update order previously
depended on the order returned by the database in StatusCollection#all_jobs. (https://gitlab.com/gitlab-org/gitlab/-/blob/3d7a5af7ce7b18c32c0a1e893f05f9c7bc6b4b92/app/services/ci/pipeline_processing/atomic_processing_service/status_collection.rb#L109-125).
This commit introduces a ci_atomic_processing_ordered_update_stage feature flag that performs a topological sort on all jobs in the stage and updates the jobs in that order. This is similar to the approach taken in ResetSkippedJobsService.
Relates to #450395 (closed)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
- Enable the
ci_atomic_processing_ordered_update_stagefeature flag. Inbin/rails consolerun,Feature.enable(:ci_atomic_processing_ordered_update_stage) - On this branch, create a
.gitlab-ci.yml:
manual1:
stage: test
when: manual
script: exit 0
test1:
stage: test
needs: [manual1]
script: exit 0
-
Run the pipeline and see that it is skipped.
-
Now switch the order:
test1:
stage: test
needs: [manual1]
script: exit 0
manual1:
stage: test
when: manual
script: exit 0
-
The pipeline should also be skipped.
-
For extra credit, create a really large DAG with this script:
jobs = []
200.times do |index|
jobs << { name: "test#{index}", needs: "test#{index+1}" }
end
jobs << { name: "test200", when: "manual" }
jobs.each do |job|
puts "#{job[:name]}:"
puts " needs: [#{job[:needs]}]" if job[:needs]
puts " when: #{job[:when]}" if job[:when]
puts " script: exit 0"
end
-
Use that script in an actual CI job.
-
In addition, you can apply this diff to read
/tmp/dag.ymlto ensure that the SQL queries look similar inlog/test.logand to profile the memory usage in/tmp/profile.txt:
diff --git a/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb b/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb
index 81f21ed70531..c2ca5d62d0d8 100644
--- a/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb
+++ b/spec/services/ci/pipeline_processing/atomic_processing_service_spec.rb
@@ -928,6 +928,28 @@ def event_on_pipeline(event)
end
end
+ context 'with large DAG' do
+ let(:config) { File.read('/tmp/dag.yml') }
+
+ let(:pipeline) do
+ Ci::CreatePipelineService.new(project, user, { ref: 'master' }).execute(:push).payload
+ end
+
+ before do
+ stub_feature_flags(ci_atomic_processing_ordered_update_stage: true)
+ stub_ci_pipeline_yaml_file(config)
+ Rails.logger.info "=== before process"
+ require 'memory_profiler'
+ report = MemoryProfiler.report { process_pipeline }
+ Rails.logger.info "=== after process"
+ output = File.open('/tmp/profile.txt','w')
+ report.pretty_print(output)
+ end
+
+ it 'does something' do
+ expect(all_builds_names_and_statuses).to eq({})
+ end
+ end
+
context 'when dependent jobs are listed after job needs' do
let(:feature_flag_state) { true }
let(:config) do
The memory profile shows we allocate a lot of RAM logging queries and backtraces!