Pipeline job depends on Resource Group could be stuck
Problem
Originally, reported in this comment.
It looks like an intermittent problem could occur on Resource Group that pipeline jobs could be stuck with waiting_for_resource
status and not being proceeded.
This could be a race condition that caused based on the asynchronous process on AssignResourceFromResourceGroupWorker
. We need a further investigation on the actual root cause of this issue.
This problem could occur only with oldest_first
or newest_first
process modes.
Additional context
In general, this problem wouldn't be noticeable as the system re-checks the upcoming jobs every time a new job is enqueued to the resource group. So as long as you keep running pipelines, the jobs which encountered the race condition are self-healed.
Analysis & Reproduced
I was able to reproduce this bug on https://gitlab.com/dosuken-org/developer-group/test-resource-group/-/pipelines. It looks like still there is a race condition. For example, user has the following .gitlab-ci.yml
build:
stage: build
resource_group: production
script: echo
deploy:
stage: deploy
resource_group: production
script: echo
Run two pipelines. When a build
job finished, the following internal processes happen:
-
PUT /api/:version/jobs/:id
marks thebuild
job to besuccess
- Invokes
AssignResourceFromResourceGroupWorker
(in order to let the next job to allocate a resource) - Invokes
PipelineProcessWorker
(in order to proceed the pipeline stages) - At this moment,
build
(status: running
) ->deploy
(status: created
)
- Invokes
async thread 1
-
AssignResourceFromResourceGroupWorker
starts - Try to allocate a resource on
deploy
but stillcreated
-
AssignResourceFromResourceGroupWorker
finishes
async thread 2
-
PipelineProcessWorker
starts - Change the
deploy
status towaiting_for_resource
- Invokes
AssignResourceFromResourceGroupWorker
(in order to try to allocate a resource for thedeploy
job)- This job could be deduplicated due to
deduplicate :until_executed
strategy.
- This job could be deduplicated due to
-
PipelineProcessWorker
finishes
Proposal
Please see #342123 (closed)