Defer pipeline creation completion after `head_pipeline_id` update

What does this MR do and why?

Eliminates a race condition between MergeRequests::CreatePipelineWorker and schedule_policy_synchronization where the Redis pipeline creation state (IN_PROGRESS -> SUCCEEDED) is cleared before head_pipeline_id is written to the database, which leads to fail-open approval rules getting prematurely unblocked.

When async pipeline creation is enabled, Ci::CreatePipelineService#execute transitions the Redis state to SUCCEEDED inside the service, but update_head_pipeline (the database write) only happens afterwards in the worker. This creates a window where schedule_policy_synchronization sees pipeline_creating? = false (Redis cleared) and diff_head_pipeline = nil (database not yet written), causing it to incorrectly enqueue UnenforceablePolicyRulesNotificationWorker, which prematurely unblocks fail-open approval rules.

This MR introduces a defer_request_completion flag, gated behind the defer_mr_pipeline_creation_request_completion feature flag. When enabled, the Redis state transition is skipped inside Ci::CreatePipelineService and instead performed by CreatePipelineWorker after update_head_pipeline. This ensures the database is consistent before completion is signalled.

References

Code references

Root cause:

Fix: defer Redis transition to after database write:

How to set up and validate locally

Same setup as !229073 (merged) (steps 1-5).

Reproducing the race condition

To trigger the race, apply the following patch which adds sleep in three places to widen the timing windows:

diff --git a/app/services/merge_requests/after_create_service.rb b/app/services/merge_requests/after_create_service.rb
index 83b17f03e690..1f93df07c2c3 100644
--- a/app/services/merge_requests/after_create_service.rb
+++ b/app/services/merge_requests/after_create_service.rb
@@ -6,6 +6,7 @@ def execute(merge_request)
       merge_request.ensure_merge_request_diff
 
       prepare_for_mergeability(merge_request)
+      sleep(30)
       prepare_merge_request(merge_request)
       mark_merge_request_as_prepared(merge_request)
 
diff --git a/app/workers/merge_requests/create_pipeline_worker.rb b/app/workers/merge_requests/create_pipeline_worker.rb
index 087a3ad1cbaf..34cfbc9b370d 100644
--- a/app/workers/merge_requests/create_pipeline_worker.rb
+++ b/app/workers/merge_requests/create_pipeline_worker.rb
@@ -66,6 +66,7 @@ def perform(project_id, user_id, merge_request_id, params = {})
 
       raise PipelineCreationRetryError, result.message if result&.error? && result.reason == :retriable_error
 
+      sleep(120)
       merge_request.update_head_pipeline
 
       complete_pipeline_creation_request(result, pipeline_creation_request, merge_request) if defer_request_completion
diff --git a/app/workers/merge_requests/update_head_pipeline_worker.rb b/app/workers/merge_requests/update_head_pipeline_worker.rb
index f6c3c0617ced..4c3c878a1bfb 100644
--- a/app/workers/merge_requests/update_head_pipeline_worker.rb
+++ b/app/workers/merge_requests/update_head_pipeline_worker.rb
@@ -12,6 +12,7 @@ class UpdateHeadPipelineWorker
     idempotent!
 
     def handle_event(event)
+      sleep(120)
       pipeline = Ci::Pipeline.in_partition(event.data[:partition_id]).find_by_id(event.data[:pipeline_id])
       return unless pipeline
 

Reproduce the bug (with feature flag disabled)

  1. Disable the feature flag:

    Feature.disable(:defer_mr_pipeline_creation_request_completion)
  2. Apply the patch above and restart Sidekiq

  3. Open a new MR (add any file change)

  4. Wait ~30s for AfterCreateService to wake up and check in Rails console:

    mr = MergeRequest.last
    mr.scan_result_policy_violations.pluck(:id, :status)
    # => [[<id>, "warn"]]  # <-- prematurely unblocked

Verify the fix (with feature flag enabled)

  1. Enable the feature flag:

    Feature.enable(:defer_mr_pipeline_creation_request_completion)
  2. Open a new MR

  3. Wait ~30s and check in Rails console:

    mr = MergeRequest.last
    mr.scan_result_policy_violations.pluck(:id, :status)
    # => [[<id>, "running"]]  # <-- correctly deferred
  4. The MR should NOT be mergeable while the pipeline is running

  5. After the pipeline completes, the rule is evaluated normally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Dominic Bauer

Merge request reports

Loading