Broadcast pipeline creation errors only after Sidekiq retries are exhausted
Issue: Pipeline creation requests randomly transition ... (#581371) • Sahil Sharma • 18.8
What does this MR do and why?
This MR fixes a race condition in async pipeline creation that causes pipeline creation requests to incorrectly transition from IN_PROGRESS → FAILED → SUCCEEDED.
Problem
When creating pipelines asynchronously via MergeRequests::CreatePipelineService#execute_async, validation failures were immediately broadcast to the frontend via the ciPipelineCreationRequestsUpdated GraphQL subscription with a FAILED status. However, many of these failures were retriable errors (like race conditions) that would succeed on retry, causing the status to transition from FAILED to SUCCEEDED within split seconds.
This created a confusing user experience where users would see a "failed" alert that immediately resolved itself. We should only broadcast errors to the frontend when Sidekiq retries have been exhausted.
Solution
This MR implements a two-part solution:
1. Differentiate between retriable and permanent errors
We now distinguish between errors that can be retried and those that will always fail:
Retriable errors (temporary race conditions):
- Duplicate pipeline still in progress (running or pending)
- Merge pipelines not enabled (can be a timing issue)
Permanent errors (will always fail):
- Merge conflicts
- Permission issues
- Invalid
.gitlab-ci.ymlfiles - Duplicate pipelines that have already completed (succeeded or failed)
2. Broadcast errors only after retries are exhausted
-
Retriable errors: Don't broadcast
FAILEDstatus immediately. Let Sidekiq retry the job. Only broadcastFAILEDif all retries are exhausted. -
Permanent errors: Broadcast
FAILEDstatus immediately since we know these will never succeed, avoiding unnecessary delays in showing the error to users.
This prevents permanent errors from being retried for extended periods when we know from the start they will always fail, while also preventing confusing status transitions for retriable errors that will likely succeed.
References
Screenshots or screen recordings
| Before | After |
|---|---|
| Screen_Recording_2025-11-18_at_10.06.29 | Screen_Recording_2025-12-11_at_18.50.34 |
How to set up and validate locally
- Enable the feature flag:
Feature.enable(:ci_pipeline_creation_requests_realtime) - Create a merge request
- Use GraphQL explorer to subscribe to pipeline creation updates:
subscription {
ciPipelineCreationRequestsUpdated(mergeRequestId: "gid://gitlab/MergeRequest/YOUR_MR_ID") {
id
iid
pipelineCreationRequests {
status
pipelineId
error
}
}
}
-
Test retriable errors: Rapidly trigger multiple pipeline creations (click "Run pipeline" multiple times). Verify that you don't see intermediate
FAILEDstates for race conditions. - Test permanent errors: Create scenarios with invalid YAML or merge conflicts. Verify these fail immediately and show the error to users without delay.
Related issues
Closes #581371
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.