Defer pipeline creation completion after `head_pipeline_id` update
What does this MR do and why?
Eliminates a race condition between MergeRequests::CreatePipelineWorker and schedule_policy_synchronization where the Redis pipeline creation state (IN_PROGRESS -> SUCCEEDED) is cleared before head_pipeline_id is written to the database, which leads to fail-open approval rules getting prematurely unblocked.
When async pipeline creation is enabled, Ci::CreatePipelineService#execute transitions the Redis state to SUCCEEDED inside the service, but update_head_pipeline (the database write) only happens afterwards in the worker. This creates a window where schedule_policy_synchronization sees pipeline_creating? = false (Redis cleared) and diff_head_pipeline = nil (database not yet written), causing it to incorrectly enqueue UnenforceablePolicyRulesNotificationWorker, which prematurely unblocks fail-open approval rules.
This MR introduces a defer_request_completion flag, gated behind the defer_mr_pipeline_creation_request_completion feature flag. When enabled, the Redis state transition is skipped inside Ci::CreatePipelineService and instead performed by CreatePipelineWorker after update_head_pipeline. This ensures the database is consistent before completion is signalled.
References
- Follow-up to !230485 (merged)
- Related to !229073 (merged)
- Related to #578352
Code references
Root cause:
Ci::CreatePipelineService#executetransitions RedisIN_PROGRESS->SUCCEEDEDinside the serviceCreatePipelineWorker#performcallsupdate_head_pipeline(database write) only afterCreatePipelineServicereturnsschedule_policy_synchronizationchecksdiff_head_pipelineandpipeline_creating?. Both can benil/falsein the gap between Redis clearing and database write
Fix: defer Redis transition to after database write:
execute_asyncpassesdefer_request_completion: truewhen the feature flag is enabledCi::CreatePipelineService#executeskips Redis transition whendefer_request_completionis setCreatePipelineWorker#performcompletes the Redis transition afterupdate_head_pipelineMergeRequests::CreatePipelineService#cannot_create_pipeline_erroralso skips Redis transition when deferred- EE
create_merged_result_pipeline_forforwards the flag toCi::CreatePipelineService
How to set up and validate locally
Same setup as !229073 (merged) (steps 1-5).
Reproducing the race condition
To trigger the race, apply the following patch which adds sleep in three places to widen the timing windows:
diff --git a/app/services/merge_requests/after_create_service.rb b/app/services/merge_requests/after_create_service.rb
index 83b17f03e690..1f93df07c2c3 100644
--- a/app/services/merge_requests/after_create_service.rb
+++ b/app/services/merge_requests/after_create_service.rb
@@ -6,6 +6,7 @@ def execute(merge_request)
merge_request.ensure_merge_request_diff
prepare_for_mergeability(merge_request)
+ sleep(30)
prepare_merge_request(merge_request)
mark_merge_request_as_prepared(merge_request)
diff --git a/app/workers/merge_requests/create_pipeline_worker.rb b/app/workers/merge_requests/create_pipeline_worker.rb
index 087a3ad1cbaf..34cfbc9b370d 100644
--- a/app/workers/merge_requests/create_pipeline_worker.rb
+++ b/app/workers/merge_requests/create_pipeline_worker.rb
@@ -66,6 +66,7 @@ def perform(project_id, user_id, merge_request_id, params = {})
raise PipelineCreationRetryError, result.message if result&.error? && result.reason == :retriable_error
+ sleep(120)
merge_request.update_head_pipeline
complete_pipeline_creation_request(result, pipeline_creation_request, merge_request) if defer_request_completion
diff --git a/app/workers/merge_requests/update_head_pipeline_worker.rb b/app/workers/merge_requests/update_head_pipeline_worker.rb
index f6c3c0617ced..4c3c878a1bfb 100644
--- a/app/workers/merge_requests/update_head_pipeline_worker.rb
+++ b/app/workers/merge_requests/update_head_pipeline_worker.rb
@@ -12,6 +12,7 @@ class UpdateHeadPipelineWorker
idempotent!
def handle_event(event)
+ sleep(120)
pipeline = Ci::Pipeline.in_partition(event.data[:partition_id]).find_by_id(event.data[:pipeline_id])
return unless pipeline
Reproduce the bug (with feature flag disabled)
-
Disable the feature flag:
Feature.disable(:defer_mr_pipeline_creation_request_completion) -
Apply the patch above and restart Sidekiq
-
Open a new MR (add any file change)
-
Wait ~30s for
AfterCreateServiceto wake up and check in Rails console:mr = MergeRequest.last mr.scan_result_policy_violations.pluck(:id, :status) # => [[<id>, "warn"]] # <-- prematurely unblocked
Verify the fix (with feature flag enabled)
-
Enable the feature flag:
Feature.enable(:defer_mr_pipeline_creation_request_completion) -
Open a new MR
-
Wait ~30s and check in Rails console:
mr = MergeRequest.last mr.scan_result_policy_violations.pluck(:id, :status) # => [[<id>, "running"]] # <-- correctly deferred -
The MR should NOT be mergeable while the pipeline is running
-
After the pipeline completes, the rule is evaluated normally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.