pipeline stuck pending owing to PipelineProcessWorker called by Ci::PipelineBridgeStatusWorker being deduplicated - "dropped until executing"
Summary
Customer raised a ticket because a pipeline in their merge train got stuck in pending state. Link for GitLab team members.
-
The parent pipeline contains only a trigger job for a child pipeline.
-
resource_group
is set on the customer's trigger job
-
- They re-ran it, and it was successful.
Kibana logs (looking for jobs with an argument of the pipeline ID)
- failed pipeline - https://log.gprd.gitlab.net/goto/3de5a5a0-5548-11ed-8d37-e9a2f393ea2a
- successful - https://log.gprd.gitlab.net/goto/4f4e9630-5548-11ed-b0ec-930003e0679c
The log entries start with jobs initiated by MergeTrains::RefreshWorker
Looking at them side-by-side
- There's a delay (see [1] below) otherwise ..
- The same jobs got kicked off for both pipelines, up to a point
- Two
PipelineProcessWorker
jobs behave differently.
-
There's a
PipelineProcessWorker
initiated byCi::ResourceGroups::AssignResourceFromResourceGroupWorker
in both pipelines, but in the failed pipeline it takes about an hour for this to trigger. The successful one, it runs with minutes or seconds of all the other jobs kicked off by byMergeTrains::RefreshWorker
- Nothing in the logs indicate why; timings look correct, no scheduling latency.
- This doesn't affect the pipeline run time. On the contrary, end to end, the problematic pipeline runs faster - two hours instead of three - as measured from when
MergeTrains::RefreshWorker
kicks everything off to the second problematicPipelineProcessWorker
...
-
When the
PipelineProcessWorker
is called byCi::PipelineBridgeStatusWorker
once the child pipeline has completed, in the pipeline that hangs, sidekiq deduplicates it. In the successful one, it executes, and there's a lot of other jobs that also run as a result.
PipelineProcessWorker JID-1e88797bef20e60017d10e95: deduplicated: dropped until executing
Possibly related:
-
#342123 (closed)
-
AssignResourceFromResourceGroupWorker
is involved in running some of the PipelineProcessWorker jobs -
deduplication.type
dropped until executing
-
-
!71979 (merged) because the following elements came up in the investigation:
-
until_executed
strategy - resource groups (
resource_group
is set on customer's the trigger job) - merge trains
-
Steps to reproduce
Example Project
What is the current bug behavior?
Some set of events cause the PipelineProcessWorker called by Ci::PipelineBridgeStatusWorker to deduplicate.
What is the expected correct behavior?
The PipelineProcessWorker called by Ci::PipelineBridgeStatusWorker executes, in this situation.
Relevant logs and/or screenshots
Output of checks
This bug happens on GitLab.com
GitLab Enterprise Edition 15.6.0-pre a334075e
Possible fixes
One user experiencing this issue was able to successfully use the following workaround:
- Rebase the problem branch
- New pipeline is created
- Merge request is able to be successfully merged