Child pipelines with manual jobs are unintuitive with strategy: depend. Pipelines can easily get stuck in running state.
Summary
- Pipelines with child pipelines.
-
strategy: depend
is in use on the bridge/trigger job - manual jobs (only) in the child pipelines
-
allow_failure:
is set true, false, or not specified on the bridge job and the child pipeline jobs - Child pipelines set with
when: manual
- The manual job has not been run
Outcome:
- The parent pipeline is either stuck running (50% of configuration options), failed (25%) or
(!)Passed
(25%) - The bridge job is either pending or failed (50/50)
If allow_failure
is not set explicitly, the following line in the docs is critical:
The default behavior of allow_failure changes to true with when: manual. However, if you use when: manual with rules, allow_failure defaults to false.
If it's not been noticed that some jobs use when: manual
and some use rules: when: manual
it's going to be hard to get predictable results. If the contents of the pipelines change a lot because of rules, then the observed result may be pipelines that get stuck running for no obvious reason.
GitLab team members can read more in the ticket
Steps to reproduce
Base pipeline for the example projects is:
.gitlab-ci.yml
stages:
- build
- deploy
- test
buildjob:
stage: build
script:
- echo true
deployjob:
stage: deploy
script:
- echo true
suite1:
stage: test
trigger:
include:
- local: ci/child1.yml
strategy: depend
ci/child1.yml
stages:
- manual
testb:
stage: manual
script:
- echo true
when: manual
-
suite1
is the trigger job, and is either setallow_failure: true
orfalse
. -
testb
is the child job and either runs the defaultallow_failure
or is similarly settrue
orfalse
Example Project
- trigger job is
strategy: depend
- child job is
when: manual
results (gitlab.com GitLab Enterprise Edition 15.0.0-pre 36181ee6)
table 1
trigger job | child job | pipeline status | trigger job status | downsteam status | branch link | pipeline link |
---|---|---|---|---|---|---|
allow_failure: true |
default |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: false |
default |
![]() |
![]() |
![]() |
branch link | pipeline link |
table 2
trigger job | child job | pipeline status | trigger job status | downsteam status | branch link | pipeline link |
---|---|---|---|---|---|---|
allow_failure: true |
allow_failure: false |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: true |
allow_failure: true |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: false |
allow_failure: false |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: false |
allow_failure: true |
![]() |
![]() |
![]() |
branch link | pipeline link |
- trigger job is
strategy: depend
- child job is
rules: when: manual
results (gitlab.com GitLab Enterprise Edition 15.0.0-pre 36181ee6)
table 3
trigger job | child job | pipeline status | trigger job status | downsteam status | branch link | pipeline link |
---|---|---|---|---|---|---|
allow_failure: true |
default |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: false |
default |
![]() |
![]() |
![]() |
branch link | pipeline link |
table 4
trigger job | child job | pipeline status | trigger job status | downsteam status | branch link | pipeline link |
---|---|---|---|---|---|---|
allow_failure: true |
allow_failure: false |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: true |
allow_failure: true |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: false |
allow_failure: false |
![]() |
![]() |
![]() |
branch link | pipeline link |
allow_failure: false |
allow_failure: true |
![]() |
![]() |
![]() |
branch link | pipeline link |
What is the current bug behavior?
- Customers can find a backlog of running pipelines in their instance.
- One side effect is that a
ref
is left in the Git repo. All pipelines have aref
created, and my testing suggests these get cleaned up by a sidekiq job when the pipeline completes. Repositories will thus accumulate lots of pipeline refs when their pipelines are not completing.
- One side effect is that a
- It's not possible to get the pipeline properly green at all.
- It's not possible to get a pipeline configured like this to run to amber (
(!) passed
) without settingallow_failure: true
on the child job and on the bridge job- setting the bridge job
allow_failure: true
prevent the status of the child pipeline properly being reflected. - if the pipeline rules sometimes cause some or all of the child pipeline jobs to be automatic (not manual) then their failure is not passed up to the parent pipeline.
- in the event that the manual jobs are executed, and fail: the parent pipeline status does not reflect this.
- setting the bridge job
- If
allow_failure
is not set explicitly, then the opposite default behaviour ofrules: when: manual
andwhen: manual
means that entirely different behaviour is observed.- Compare Table 1 with Table 3. the only difference is how the child job is set manual. This is a subtle code difference, but the overall behaviour of the pipelines is very different.
- On the child job: if you've used
rules: when: manual
, and not setallow_failure
, your outcome is always going to be the pipeline is stuck running. On a another project, withwhen: manual
sometimes the pipelines fail, sometimes they're amber.
What is the expected correct behavior?
Customers get predictable results.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)