Flag 'ci_same_stage_job_needs' can cause jobs to be skipped when a manual job is involved
Summary
When flag ci_same_stage_job_needs
is turned on (feature flag for #30632 (closed)) existing needs-based configuration can cause jobs to be skipped at execution when a manual action is required on the earliest job that the needs configuration looks for.
See also customer ticket https://gitlab.zendesk.com/agent/tickets/227183 (internal link) for more information.
Steps to reproduce
Run the following pipeline on a project with the ci_same_stage_job_needs
flag enabled.
Click to expand `.gitlab-ci.yml` contents
stages:
- first
- second
- third
- fourth
- fifth
First:
stage: first
script:
- echo
Second:
stage: second
script:
- echo
when: manual
Third:
stage: third
needs: ["Second"]
script:
- echo
allow_failure: false
Fourth:
stage: fourth
needs: ["Third"]
script:
- echo
allow_failure: false
Fifth:
stage: fifth
needs: ["Fourth"]
script:
- echo
allow_failure: false
After the pipeline auto-executes job First
, invoke the next stage's lone manual job Second
whose completion should run the remaining pipeline.
After Second
completes execution, observe that Third
executes, but then Fourth
and Fifth
do not follow.
Pipeline remains hung after completion of Third
, leaving everything else in a skipped state.
Disable the flag ci_same_stage_job_needs
and in a new pipeline observe that after Third
executes, Fourth
and Fifth
follow.
Observe also that the above CI config does not make use of same-stage needs references. All needs references are cross-stage (as permitted prior to this flag) so this is a regression.
Example Project
https://gitlab.com/gitlab-gold/hchouraria/sample-ci/
What is the current bug behavior?
Jobs with needs defined remain in a skipped stage even after the job they depend upon passes.
What is the expected correct behavior?
Jobs with needs defined must execute after the job they depend upon passes.
Relevant logs and/or screenshots
One observable difference in Sidekiq logs is that when the Third
job completes:
- With the flag
ci_same_stage_job_needs
enabled, noBuildSuccessWorker
is found for theThird
job - With the flag
ci_same_stage_job_needs
disabled, aBuildSuccessWorker
is observed forThird
job
Output of checks
This bug happens on GitLab.com
Possible fixes
A workaround here is to retry the last passed job (job Third
in example above), which then appears to fire internal events necessary to execute the next job (job Fourth
), and then retry that one (job Fourth
) to execute the next (job Fifth
), etc.
It may be impractical or disallowed for certain CI config implementations to retry their jobs.