CI: Environment destroy job is retried endlessly even if it can never succeed
Summary
As of Gitlab 16.6 Premium (but starting in an earlier version), the "stop environment" CI Job is retried endlessly after failured congesting runners and slowing every other CI pipeline down.
This happens even for very old pipelines, where the actual review environment is long stopped, and the job can never succeed (either because infra changed in the meanwhile or because that particular revision of the job had a bug).
Gitlab currently offers no apparent way to stop those endless retries.
Steps to reproduce
Take a simple CI pipeline like this:
deploy review:
only: [ merge_requests ]
environment:
name: review-$CI_MR_ID
on_stop: "destroy review"
auto_stop: 7 days
script:
- true
destroy review:
only: [ merge_requests ]
environment:
name: review-$CI_MR_ID
action: stop
script:
- false
Example Project
We are using Gitlab Premium self-hosted. But I will try to reproduce this on Gitlab.com.
What is the current bug behavior?
After an MR is closed, stopping the environment will fail (that is expected), but it will retry and retry forever creating new "destroy review" jobs.
Even if we fix the "destroy review" script, it can't resolve the problem, since the MR is already closed and the retry happens on a fixed git commit.
What is the expected correct behavior?
Before the bug was introduced, Gitlab would just fail the job once and not retry it. That at least would not create so many zombie jobs.
Relevant logs and/or screenshots
Note that the list of failed jobs is so long that I can't even sroll to the bottom.
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)