Skip to content

Flag 'ci_same_stage_job_needs' can cause jobs to be skipped when a manual job is involved

Summary

When flag ci_same_stage_job_needs is turned on (feature flag for #30632 (closed)) existing needs-based configuration can cause jobs to be skipped at execution when a manual action is required on the earliest job that the needs configuration looks for.

See also customer ticket https://gitlab.zendesk.com/agent/tickets/227183 (internal link) for more information.

Steps to reproduce

Run the following pipeline on a project with the ci_same_stage_job_needs flag enabled.

Click to expand `.gitlab-ci.yml` contents
stages:
  - first
  - second
  - third
  - fourth
  - fifth

First:
  stage: first
  script:
   - echo

Second:
  stage: second
  script:
   - echo
  when: manual

Third:
  stage: third
  needs: ["Second"]
  script:
   - echo
  allow_failure: false

Fourth:
   stage: fourth
   needs: ["Third"]
   script:
    - echo
   allow_failure: false

Fifth:
   stage: fifth
   needs: ["Fourth"]
   script:
    - echo
   allow_failure: false

After the pipeline auto-executes job First, invoke the next stage's lone manual job Second whose completion should run the remaining pipeline.

After Second completes execution, observe that Third executes, but then Fourth and Fifth do not follow.

Pipeline remains hung after completion of Third, leaving everything else in a skipped state.

Disable the flag ci_same_stage_job_needs and in a new pipeline observe that after Third executes, Fourth and Fifth follow.

Observe also that the above CI config does not make use of same-stage needs references. All needs references are cross-stage (as permitted prior to this flag) so this is a regression.

Example Project

https://gitlab.com/gitlab-gold/hchouraria/sample-ci/

What is the current bug behavior?

Jobs with needs defined remain in a skipped stage even after the job they depend upon passes.

What is the expected correct behavior?

Jobs with needs defined must execute after the job they depend upon passes.

Relevant logs and/or screenshots

One observable difference in Sidekiq logs is that when the Third job completes:

  • With the flag ci_same_stage_job_needs enabled, no BuildSuccessWorker is found for the Third job
  • With the flag ci_same_stage_job_needs disabled, a BuildSuccessWorker is observed for Third job

Output of checks

This bug happens on GitLab.com

Possible fixes

A workaround here is to retry the last passed job (job Third in example above), which then appears to fire internal events necessary to execute the next job (job Fourth), and then retry that one (job Fourth) to execute the next (job Fifth), etc.

It may be impractical or disallowed for certain CI config implementations to retry their jobs.

Edited by Harsh Chouraria