Undesirable effect of "Skip outdated deployment jobs" CI feature on manual jobs + related stages
Summary
Customer raised a ticket (internal links) because there were getting slack notifications for deployment jobs failing.
The jobs were failing in historic pipelines, and related to deployments that were never run.
Customer uses manual deployment jobs.
Investigation showed that, depending on how pipeline was configured, the effect of leaving this feature enabled (which is the default) can put historic pipelines into some odd states.
Docs for skip outdated deployment jobs state "The pending deployment jobs will be skipped." - this doesn't happen in the case of manual jobs. Mostly, manual jobs cease to be undeployed manual jobs, and are switched to a failed state, and the stages in the pipeline can switch from blocked or manual to running or failed.
However, there's a number of outcomes possible, as shown in the examples, potentially all within the same pipeline.
Workaround
Disable skip outdated deployment jobs on the project
Set severity3
Steps to reproduce / Example Project
Example CI attached along with screenshots below.
What is the current bug behavior?
Manual jobs are set failed, stages in pipelines get switched to 'running' (or maybe 'pending' states) - the treatment can vary for environments in the same pipeline; eg: see example 3.
The purpose of this feature seems to prevent multiple live pipelines deploying to the same environment in the wrong order. If an earlier pipeline tries to deploy old code to an environment, it should get skipped.
This doesn't really apply to manual jobs.
What is the expected correct behavior?
Either don't change manual jobs, or set them skipped. Skipped also removes the play button, reducing the chance of accidental deployment.
See example 3 for a CI config where 1/3 manual deploy jobs are left undeployed, 1/3 set failed, and 1/3 set skipped. So this feature does sometimes set manual jobs skipped.
Relevant logs and/or screenshots
- The example CI only runs deployments after a merge request.
- In each example, changes to the project are made twice, and merged, triggering the full CI config twice.
- The first time, no deployments are run, for example:
- The second time, all deployments are run, eg:
-
This issue relates to what happens as a result to the earlier pipeline.
-
The status of the first pipeline is shown in both its original state, and the revised state after the subsequent deployment.
example 1
- gitlab-ci.yml
- Summary before: after:
example 2
- gitlab-ci.yml
- Summary before: after:
Example 3
This was from earlier in the testing, before the test case was simplified.
This produces an outcome where one of the deploy jobs is skipped, then the stage goes into 'pending' because there's another job in that stage that would normally run once the previous stage is deployed.
- ci gitlab-ci.yml
- Summary before: after:
failed job status
To complete the account of what happens, the job page displays the following:
Full text so it's searchable:
The deployment job is older than the previously succeeded deployment job, and therefore cannot be run
The deployment of this job to stg did not succeed.
Output of checks
Results of GitLab environment info
- Customer reported issue on gitlab.com; 13.5.0-pre 417adc72
- Reproduced on 13.3.5 self managed