Pipeline job depends on Resource Group could be stuck
Problem
Originally, reported in this comment.
It looks like an intermittent problem could occur on Resource Group that pipeline jobs could be stuck with waiting_for_resource status and not being proceeded.
This could be a race condition that caused based on the asynchronous process on AssignResourceFromResourceGroupWorker. We need a further investigation on the actual root cause of this issue.
This problem could occur only with oldest_first or newest_first process modes.
Additional context
In general, this problem wouldn't be noticeable as the system re-checks the upcoming jobs every time a new job is enqueued to the resource group. So as long as you keep running pipelines, the jobs which encountered the race condition are self-healed.
Analysis & Reproduced
I was able to reproduce this bug on https://gitlab.com/dosuken-org/developer-group/test-resource-group/-/pipelines. It looks like still there is a race condition. For example, user has the following .gitlab-ci.yml
build:
stage: build
resource_group: production
script: echo
deploy:
stage: deploy
resource_group: production
script: echo
Run two pipelines. When a build job finished, the following internal processes happen:
-
PUT /api/:version/jobs/:idmarks thebuildjob to besuccess- Invokes
AssignResourceFromResourceGroupWorker(in order to let the next job to allocate a resource) - Invokes
PipelineProcessWorker(in order to proceed the pipeline stages) - At this moment,
build(status: running) ->deploy(status: created)
- Invokes
async thread 1
-
AssignResourceFromResourceGroupWorkerstarts - Try to allocate a resource on
deploybut stillcreated -
AssignResourceFromResourceGroupWorkerfinishes
async thread 2
-
PipelineProcessWorkerstarts - Change the
deploystatus towaiting_for_resource - Invokes
AssignResourceFromResourceGroupWorker(in order to try to allocate a resource for thedeployjob)- This job could be deduplicated due to
deduplicate :until_executedstrategy.
- This job could be deduplicated due to
-
PipelineProcessWorkerfinishes
Proposal
Please see #342123 (closed)