Backend: Jobs that run on_failure are sometimes unexpectedly skipped when they also have optional needs
Summary
When a job has when: on_failure, it should run when at least one other job in the same pipeline fails. When the job that has when: on_failure also has needs, the job is unexpectedly skipped when other jobs in the same pipeline fail.
If the needs are removed: the when: on_failure job works properly: it runs when other jobs in the same pipeline fail.
Steps to reproduce
- Use a
.gitlab-ci.ymlfile like the one shown below - Observe that the
buildjob fails - Observe that the
rollbackjob is skipped (Therollbackjob should run becausebuildfailed.)
build_job:
stage: build
script:
- exit 1
test_job:
stage: test
script:
- date
rollback_job:
stage: deploy
needs:
- job: test_job
optional: true
- job: build_job
optional: true
script:
- date
when: on_failure
Proposal
The reason is that we are skipping the job if it is a DAG job and needs any skipped or ignored job; The below condition should be modified to accommodate this scenario for when it occurs.
if @dag && any_skipped_or_ignored?
# The DAG job is skipped if one of the needs does not run at all.
'skipped'
Example Project
This unexpected behavior can be observed in the
What is the current bug behavior?
A job with when: on_failure is skipped when it contains needs and at least one job in the pipeline has failed.
What is the expected correct behavior?
A job with when: on_failure and needs should run when at least one other job in in the pipeline has failed.
The screenshot above shows what things should look like. Removing the needs altogether permits things to look like the screenshot above.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)
Possible fixes
Possible Workarounds
- Remove the
needsfrom therollbackjob completely- This may not be feasible for some environments.
- Possibly: move the jobs that may fail to the another stage in the pipeline
- I wrote "sometimes" in the issue title because there is one specific set of circumstances I have identified thus far where the presence of
when: on_failureand optionalneedsdo work as expected. See this example pipeline.
- I wrote "sometimes" in the issue title because there is one specific set of circumstances I have identified thus far where the presence of
A few more thoughts on this:
Observe that the optional needs job fails.
The documentation on needs:optional notes:
To need a job that sometimes does not exist in the pipeline, add
optional: trueto theneedsconfiguration.
That sounds like it's about the absence or presence of the needed job and not about the success or failure of the job.
-
Is the thought above right?
It is not possible to use allow_failure to work around this because we also note in the docs:
- If
allow_failure: trueis set, the job is always considered successful, and later jobs withwhen: on_failuredon’t start if this job fails.e

