Skip to content

Child pipelines with manual jobs are unintuitive with strategy: depend. Pipelines can easily get stuck in running state.

Summary

  • Pipelines with child pipelines.
  • strategy: depend is in use on the bridge/trigger job
  • manual jobs (only) in the child pipelines
  • allow_failure: is set true, false, or not specified on the bridge job and the child pipeline jobs
  • Child pipelines set with when: manual
  • The manual job has not been run

Outcome:

  • The parent pipeline is either stuck running (50% of configuration options), failed (25%) or (!)Passed (25%)
  • The bridge job is either pending or failed (50/50)

If allow_failure is not set explicitly, the following line in the docs is critical:

The default behavior of allow_failure changes to true with when: manual. However, if you use when: manual with rules, allow_failure defaults to false.

If it's not been noticed that some jobs use when: manual and some use rules: when: manual it's going to be hard to get predictable results. If the contents of the pipelines change a lot because of rules, then the observed result may be pipelines that get stuck running for no obvious reason.

GitLab team members can read more in the ticket

Steps to reproduce

Base pipeline for the example projects is:

  • .gitlab-ci.yml
stages:
  - build
  - deploy
  - test

buildjob:
  stage: build
  script:
    - echo true

deployjob:
  stage: deploy
  script:
    - echo true

suite1:
  stage: test
  trigger:
    include:
     - local: ci/child1.yml
    strategy: depend
  • ci/child1.yml
stages:
  - manual

testb:
  stage: manual
  script:
    - echo true
  when: manual
  • suite1 is the trigger job, and is either set allow_failure: true or false.
  • testb is the child job and either runs the default allow_failure or is similarly set true or false

Example Project

test project one

  • trigger job is strategy: depend
  • child job is when: manual

results (gitlab.com GitLab Enterprise Edition 15.0.0-pre 36181ee6)

table 1

trigger job child job pipeline status trigger job status downsteam status branch link pipeline link
allow_failure: true default image passed image failed image skip branch link pipeline link
allow_failure: false default image failed image failed image skip branch link pipeline link

table 2

trigger job child job pipeline status trigger job status downsteam status branch link pipeline link
allow_failure: true allow_failure: false image running image pause image manual branch link pipeline link
allow_failure: true allow_failure: true image passed image failed image skip branch link pipeline link
allow_failure: false allow_failure: false image running image pause image manual branch link pipeline link
allow_failure: false allow_failure: true image failed image failed image skip branch link pipeline link

test project two

  • trigger job is strategy: depend
  • child job is rules: when: manual

results (gitlab.com GitLab Enterprise Edition 15.0.0-pre 36181ee6)

table 3

trigger job child job pipeline status trigger job status downsteam status branch link pipeline link
allow_failure: true default image running image pause image manual branch link pipeline link
allow_failure: false default image running image pause image manual branch link pipeline link

table 4

trigger job child job pipeline status trigger job status downsteam status branch link pipeline link
allow_failure: true allow_failure: false image running image pause image manual branch link pipeline link
allow_failure: true allow_failure: true image passed image failed image skip branch link pipeline link
allow_failure: false allow_failure: false image running image pause image manual branch link pipeline link
allow_failure: false allow_failure: true image failed image failed image skip branch link pipeline link

What is the current bug behavior?

  • Customers can find a backlog of running pipelines in their instance.
    • One side effect is that a ref is left in the Git repo. All pipelines have a ref created, and my testing suggests these get cleaned up by a sidekiq job when the pipeline completes. Repositories will thus accumulate lots of pipeline refs when their pipelines are not completing.
  • It's not possible to get the pipeline properly green at all.
  • It's not possible to get a pipeline configured like this to run to amber ( (!) passed ) without setting allow_failure: true on the child job and on the bridge job
    • setting the bridge job allow_failure: true prevent the status of the child pipeline properly being reflected.
    • if the pipeline rules sometimes cause some or all of the child pipeline jobs to be automatic (not manual) then their failure is not passed up to the parent pipeline.
    • in the event that the manual jobs are executed, and fail: the parent pipeline status does not reflect this.
  • If allow_failure is not set explicitly, then the opposite default behaviour of rules: when: manual and when: manual means that entirely different behaviour is observed.
    • Compare Table 1 with Table 3. the only difference is how the child job is set manual. This is a subtle code difference, but the overall behaviour of the pipelines is very different.
    • On the child job: if you've used rules: when: manual, and not set allow_failure, your outcome is always going to be the pipeline is stuck running. On a another project, with when: manual sometimes the pipelines fail, sometimes they're amber.

What is the expected correct behavior?

Customers get predictable results.

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

Edited by Furkan Ayhan