Skip to content

Fix Resource Groups are not refreshed when a pipeline is canceled

Shinya Maeda requested to merge fix-resource-group-stuck-issue into master

What does this MR do and why?

This ~bug was found during the investigation on this issue.

Resource Groups are refreshed/re-checked when AssignResourceFromResourceGroupWorker runs. This worker is idempotent and safe to be called in any times. We expect this worker always runs after a job has finished, however, currently it won't run when a finished job doesn't retain a resource i.e. release_resource_from returns false.

In some cases, a finished job doesn't retain a resource and this is totally legitimate, like canceling a job. For example, a user accidentally configured a bad pipeline that encounters a dead lock (please see the documentation in this MR for more info). They want to cancel the problematic pipeline to resume the entire pipelines, however, the problematic pipeline has not retained a resource yet, so it stays in stuck. This MR ensures the AssignResourceFromResourceGroupWorker worker to always run, so that the entire pipelines can be recovered from the problematic configuration.

Related #342123 (closed)

Screenshots or screen recordings

With this fix, the pipelines can be resumed by canceling a problematic pipeline.

Peek_2021-10-01_17-54

Without this fix, the pipelines are being stuck even if canceled a problematic pipeline.

Peek_2021-10-01_18-02

How to set up and validate locally

Run a pipeline with the following configuration:

.rg: 
  resource_group: rg-test
  script: echo

# BAD
trigger-downstream-2:
  stage: test
  trigger:
    include: child-with-resource-group.yml
    strategy: depend

deploy-1:
  extends: .rg
  stage: deploy

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Shinya Maeda

Merge request reports