Environment auto_stop_in can be delayed for up to an hour
Problem to solve
Discovered while debugging #234143 (comment 400209902)
auto_stop_in is not precise to a minute, which is ok, but sometimes it may require 15-30 minutes to actually start stop
job.
Sometimes this leads to a strange behavior in two ways:
- I expected my environment to stop 20 min ago, and it's still up
- There was an
auto stop in
on my environment, but no it's gone(this is backend/UI bug - we only show this time if it's in the future, and if environment should have stopped 10 min ago, but it's still up, thenauto stop in
will be empty.)
Example
https://gitlab.com/vshushlin/auto-stop-test/-/blob/master/.gitlab-ci.yml
stages:
- review
review_app:
stage: review
before_script:
- echo $CI_COMMIT_REF_SLUG
- echo "review_app before_script"
environment:
name: review-master/${CI_COMMIT_REF_SLUG}
auto_stop_in: 1 minutes
on_stop: stop-review
script: echo "A++++++++++ Would review again"
stop-review:
stage: review
variables:
GIT_STRATEGY: none
environment:
name: review-master/${CI_COMMIT_REF_SLUG}
action: stop
script: echo "Stop review"
when: manual
And the pipeline: https://gitlab.com/vshushlin/auto-stop-test/-/pipelines/180606188/builds
In the example above environment should stop 1 minute after deploy, but actually stop
job 33 minutes later.
Source of the problem
We run auto stop cron job once per hour at minute 24 and it does 1000 iterations: https://gitlab.com/gitlab-org/gitlab/-/blob/c2d68b944d16fd946d2e5097aaaf73453c5de6a3/app/services/environments/auto_stop_service.rb#L23
It also limited by 45 minute timeout.
This means that:
- From 9-th minute to 24-th auto-stop doesn't work, and all these environments will be stopped at 24-th minutes in one batch.
- If the worker performs 1000 iterations in less than 45 minutes, then this timewindow becomes even bigger.
I don't know a nice fix for that on top of my head, but that should be relatively easy to fix. At the same time, it's probably not super urgent, since all that happens is just a delay of starting stop
job by an hour at most.
Proposal
Highlight on the UI that the stop will happen in the future.
TBD: copy, see the related documentation