Skip to content

Pipeline job depends on Resource Group could be stuck

Problem

Originally, reported in this comment.

It looks like an intermittent problem could occur on Resource Group that pipeline jobs could be stuck with waiting_for_resource status and not being proceeded.

This could be a race condition that caused based on the asynchronous process on AssignResourceFromResourceGroupWorker. We need a further investigation on the actual root cause of this issue.

This problem could occur only with oldest_first or newest_first process modes.

Additional context

In general, this problem wouldn't be noticeable as the system re-checks the upcoming jobs every time a new job is enqueued to the resource group. So as long as you keep running pipelines, the jobs which encountered the race condition are self-healed.

Analysis & Reproduced

I was able to reproduce this bug on https://gitlab.com/dosuken-org/developer-group/test-resource-group/-/pipelines. It looks like still there is a race condition. For example, user has the following .gitlab-ci.yml

build:
  stage: build
  resource_group: production
  script: echo

deploy:
  stage: deploy
  resource_group: production
  script: echo

Run two pipelines. When a build job finished, the following internal processes happen:

  1. PUT /api/:version/jobs/:id marks the build job to be success
    1. Invokes AssignResourceFromResourceGroupWorker (in order to let the next job to allocate a resource)
    2. Invokes PipelineProcessWorker (in order to proceed the pipeline stages)
    3. At this moment, build (status: running) -> deploy (status: created)

async thread 1

  1. AssignResourceFromResourceGroupWorker starts
  2. Try to allocate a resource on deploy but still created
  3. AssignResourceFromResourceGroupWorker finishes

async thread 2

  1. PipelineProcessWorker starts
  2. Change the deploy status to waiting_for_resource
  3. Invokes AssignResourceFromResourceGroupWorker (in order to try to allocate a resource for the deploy job)
    1. This job could be deduplicated due to deduplicate :until_executed strategy.
  4. PipelineProcessWorker finishes

Proposal

Please see #342123 (closed)

Edited by Shinya Maeda