Child pipelines with manual jobs are unintuitive with strategy: depend. Pipelines can easily get stuck in running state.

Summary

Pipelines with child pipelines.
strategy: depend is in use on the bridge/trigger job
manual jobs (only) in the child pipelines
allow_failure: is set true, false, or not specified on the bridge job and the child pipeline jobs
Child pipelines set with when: manual
The manual job has not been run

Outcome:

The parent pipeline is either stuck running (50% of configuration options), failed (25%) or (!)Passed (25%)
The bridge job is either pending or failed (50/50)

If allow_failure is not set explicitly, the following line in the docs is critical:

The default behavior of allow_failure changes to true with when: manual. However, if you use when: manual with rules, allow_failure defaults to false.

If it's not been noticed that some jobs use when: manual and some use rules: when: manual it's going to be hard to get predictable results. If the contents of the pipelines change a lot because of rules, then the observed result may be pipelines that get stuck running for no obvious reason.

GitLab team members can read more in the ticket

Steps to reproduce

Base pipeline for the example projects is:

.gitlab-ci.yml

stages:
  - build
  - deploy
  - test

buildjob:
  stage: build
  script:
    - echo true

deployjob:
  stage: deploy
  script:
    - echo true

suite1:
  stage: test
  trigger:
    include:
     - local: ci/child1.yml
    strategy: depend

ci/child1.yml

stages:
  - manual

testb:
  stage: manual
  script:
    - echo true
  when: manual

suite1 is the trigger job, and is either set allow_failure: true or false.
testb is the child job and either runs the default allow_failure or is similarly set true or false

Example Project

test project one

trigger job is strategy: depend
child job is when: manual

results (gitlab.com GitLab Enterprise Edition 15.0.0-pre 36181ee6)

table 1

trigger job	child job	pipeline status	trigger job status	downsteam status	branch link	pipeline link
`allow_failure: true`	default	passed	failed	skip	branch link	pipeline link
`allow_failure: false`	default	failed	failed	skip	branch link	pipeline link

table 2

trigger job	child job	pipeline status	trigger job status	downsteam status	branch link	pipeline link
`allow_failure: true`	`allow_failure: false`	running	pause	manual	branch link	pipeline link
`allow_failure: true`	`allow_failure: true`	passed	failed	skip	branch link	pipeline link
`allow_failure: false`	`allow_failure: false`	running	pause	manual	branch link	pipeline link
`allow_failure: false`	`allow_failure: true`	failed	failed	skip	branch link	pipeline link

test project two

trigger job is strategy: depend
child job is rules: when: manual

results (gitlab.com GitLab Enterprise Edition 15.0.0-pre 36181ee6)

table 3

trigger job	child job	pipeline status	trigger job status	downsteam status	branch link	pipeline link
`allow_failure: true`	default	running	pause	manual	branch link	pipeline link
`allow_failure: false`	default	running	pause	manual	branch link	pipeline link

table 4

trigger job	child job	pipeline status	trigger job status	downsteam status	branch link	pipeline link
`allow_failure: true`	`allow_failure: false`	running	pause	manual	branch link	pipeline link
`allow_failure: true`	`allow_failure: true`	passed	failed	skip	branch link	pipeline link
`allow_failure: false`	`allow_failure: false`	running	pause	manual	branch link	pipeline link
`allow_failure: false`	`allow_failure: true`	failed	failed	skip	branch link	pipeline link

What is the current bug behavior?

Customers can find a backlog of running pipelines in their instance.
- One side effect is that a ref is left in the Git repo. All pipelines have a ref created, and my testing suggests these get cleaned up by a sidekiq job when the pipeline completes. Repositories will thus accumulate lots of pipeline refs when their pipelines are not completing.
It's not possible to get the pipeline properly green at all.
It's not possible to get a pipeline configured like this to run to amber ( (!) passed ) without setting allow_failure: true on the child job and on the bridge job
- setting the bridge job allow_failure: true prevent the status of the child pipeline properly being reflected.
- if the pipeline rules sometimes cause some or all of the child pipeline jobs to be automatic (not manual) then their failure is not passed up to the parent pipeline.
- in the event that the manual jobs are executed, and fail: the parent pipeline status does not reflect this.
If allow_failure is not set explicitly, then the opposite default behaviour of rules: when: manual and when: manual means that entirely different behaviour is observed.
- Compare Table 1 with Table 3. the only difference is how the child job is set manual. This is a subtle code difference, but the overall behaviour of the pipelines is very different.
- On the child job: if you've used rules: when: manual, and not set allow_failure, your outcome is always going to be the pipeline is stuck running. On a another project, with when: manual sometimes the pipelines fail, sometimes they're amber.

What is the expected correct behavior?

Customers get predictable results.

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info


(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)
(we will only investigate if the tests are passing)

Possible fixes

Edited Apr 04, 2024 by Furkan Ayhan