Skip to content

Only create join record once the build is assigned to a runner

What does this MR do and why?

Describe in detail what your merge request does and why.

This MR fixes the race condition that caused an S2 incident recently. The logic is behind a FF. The problem was caused by the fact that build.runner_machine = runner_machine creates the join record immediately and not when the build model is saved. This can cause a race condition if there are multiple runners competing for the same job. The first one will create the p_ci_runner_machine_builds join record, and the others will cause duplicate key constraint violation errors. By creating the join record only when the runner has been positively assigned the job, we sidestep the issue.

Note: the specs diff should be viewed in a tool that can ignore whitespace differences, as most of the changes are indentation-related.

Closes #396852 (closed)

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Running the spec against the old code fails tests
......................................................F.............F...................

Failures:

  1) Ci::RegisterJobService#execute when using pending builds table allow shared runners when build owner has been blocked with runner machine specified does not pick the build and does not create join record
     Failure/Error: expect(Ci::RunnerMachineBuild.all).to be_empty
       expected `#<ActiveRecord::Relation [#<Ci::RunnerMachineBuild partition_id: 100, build_id: 105, runner_machine_id: 1>]>.empty?` to be truthy, got false
     Shared Example Group: "handles runner assignment" called from ./spec/services/ci/register_job_service_spec.rb:855
     # ./spec/services/ci/register_job_service_spec.rb:181:in `block (7 levels) in <module:Ci>'
     # ./spec/spec_helper.rb:502:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:18:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:494:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:490:in `block (3 levels) in <top (required)>'
     # ./lib/gitlab/application_context.rb:59:in `with_raw_context'
     # ./spec/spec_helper.rb:490:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:242:in `block (2 levels) in <top (required)>'
     # ./spec/support/system_exit_detected.rb:7:in `block (2 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:108:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:62:in `with_cross_joins_prevented'
     # ./spec/support/database/prevent_cross_joins.rb:108:in `block (2 levels) in <main>'

  2) Ci::RegisterJobService#execute when using pending builds table deleted projects for project runner with runner machine specified does not pick a build
     Failure/Error: expect(Ci::RunnerMachineBuild.all).to be_empty
       expected `#<ActiveRecord::Relation [#<Ci::RunnerMachineBuild partition_id: 100, build_id: 119, runner_machine_id: 3>]>.empty?` to be truthy, got false
     Shared Example Group: "handles runner assignment" called from ./spec/services/ci/register_job_service_spec.rb:855
     # ./spec/services/ci/register_job_service_spec.rb:146:in `block (7 levels) in <module:Ci>'
     # ./spec/spec_helper.rb:502:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:18:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:494:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:490:in `block (3 levels) in <top (required)>'
     # ./lib/gitlab/application_context.rb:59:in `with_raw_context'
     # ./spec/spec_helper.rb:490:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:242:in `block (2 levels) in <top (required)>'
     # ./spec/support/system_exit_detected.rb:7:in `block (2 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:108:in `block (3 levels) in <main>'
     # ./spec/support/database/prevent_cross_joins.rb:62:in `with_cross_joins_prevented'
     # ./spec/support/database/prevent_cross_joins.rb:108:in `block (2 levels) in <main>'

Finished in 1 minute 1.65 seconds (files took 38.18 seconds to load)
88 examples, 2 failures

Failed examples:

rspec ./spec/services/ci/register_job_service_spec.rb:178 # Ci::RegisterJobService#execute when using pending builds table allow shared runners when build owner has been blocked with runner machine specified does not pick the build and does not create join record
rspec ./spec/services/ci/register_job_service_spec.rb:142 # Ci::RegisterJobService#execute when using pending builds table deleted projects for project runner with runner machine specified does not pick a build

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Merge request reports

Loading