Auto stop doesn't work after redeploying an environment
Summary
When using auto_stop_in
, the environment gets stopped as expected, however if the deploy job starts again and the environments starts a second time (or more), the auto_stop_in job will not run again, and the environment will not be actually stopped.
Also, confusingly, the environments get updated as stopped in the GitLab UI, but they are not, because the stop job did not run.
Steps to reproduce
- Use a
.gitlab-ci.yml
with a start and stop job, example here - Deploy the environment, wait for it to get auto stopped
- Deploy the environment again, and notice after "auto_stop_in" time passes (minimum hourly because it's in a Sidekiq cron job, see #240886) the environment shows as stopped, but the stop job never ran again.
Example Project
https://gitlab.com/cat/repro-auto-stop/-/pipelines/313613453
You can see both jobs ran, but if you look at https://gitlab.com/cat/repro-auto-stop/-/jobs, the stop job only ran once
, for the first auto-stop, and the environments shows as stopped on the Environments page.
What is the current bug behavior?
Environments don't get properly stopped when using auto_stop_in, if they get redeployed
What is the expected correct behavior?
auto_stop_in
should be respected and environments get properly auto stopped.
Possible fixes
The problem is really that Ci::StopEnvironmentsService#execute_in_batch
attempts to preload the stop_actions for every environment. There is a condition in the CTE that only retrieves jobs with the status as BLOCKED_STATUS
(manual, scheduled). However, we update all the environments to state: stopped
either way, but the stop_actions aren't actually fetched and ran.
Manually finding the stop_action job and running it works, so it doesn't seem like a limitation when the job has the status success
, through something like: Environment.find(id).stop_with_action!(User.find_by_username("root"))
Currently, the only workaround seems to be manually finding (since the environment is now stopped, so this is no easy way by itself) & running the stop job again.