Skip to content

"Prevent outdated deployment jobs" still doesn't cancel pending deployment jobs

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

When there are multiple simultaneous deployment jobs (environment:name is set, they have a resource_group, etc.), and "Prevent outdated deployment jobs" is checked in the project's CI/CD settings, then all but the latest deployment job should be cancelled. Instead, all deployment jobs are executed, potentially in the wrong order, causing an unintentional rollback of the deployment to a previous commit. This seems related to e.g. #408981 (closed) but perhaps slightly different?

Steps to reproduce

  1. Configure a repository with the following .gitlab-ci.yml:

    stages:
      - build
      - deploy
    
    build:
      stage: build
      script:
        - sleep 60
    
    deploy:
      stage: deploy
      environment:
        name: production
        action: start
      resource_group: production
      script:
        - sleep 300

    and wait for the initial build/deploy jobs to finish.

  2. Commit a change which swaps out the build job's sleep 60 for a sleep 300 (commit message "long build")

  3. Wait a moment and make sure that a new pipeline has been created

  4. Commit a change which swaps the build job's sleep back to a sleep 60 (commit message "short build")

  5. Observe that, after a minute, the "short build" deploy job starts, while the "long build" build job is still ongoing (this is fine, "short build" is the latest commit and therefore is correctly being deployed)

  6. Observe that, after four more minutes, the "long build" build job finishes, and the subsequent "long build" deploy job goes to "waiting" state – even though it is outdated (a later deploy job is currently running)! Ok, but maybe it will be cancelled once the currently-running deploy job finishes...?

  7. Observe that, after one more minute, the "short build" deploy job finishes, and the "long build" deploy job is started – even though it is outdated (a later deploy job has already completed)! Now we have a running outdated deploy job (an old commit, in a pipeline created before the most recent pipeline to run).

  8. Once all jobs have completed, observe on the Environments page that the latest deployment is from "long build", even though "short build" was committed more recently

Example Project

What is the current bug behavior?

"Pending" (broadly defined) deploy jobs for non-"latest" pipelines are not consistently cancelled or skipped, and instead run to completion, causing deployed environments to run code other than the latest commit.

What is the expected correct behavior?

If a pipeline containing a deploy job for environment X is started, and "Prevent outdated deployment jobs" is enabled, all previous instances of that deployment job (that is, all jobs deploying to environment X) should be cancelled/skipped/aborted.

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

This is reproducible on GitLab v17.9.1-ee.

Expand for output related to GitLab environment info

 (For installations with omnibus-gitlab package run and paste the output of: \`sudo gitlab-rake gitlab:env:info\`)  (For installations from source run and paste the output of: \`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\`)  

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`)

(For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`)

(we will only investigate if the tests are passing)

Possible fixes

Edited by 🤖 GitLab Bot 🤖