CI: auto_stop_in for job retries
Summary
The auto_stop_in
keyword in a job normally runs the environment stop
action after the time expires. However, the auto_stop_in
setting is ignored if the job running the start action fails. We worked around this issue somewhat in our charts pipelines by separating the creation of an environment into two different jobs -- create_review_*
and review_*
. The create_review
job just creates the environment and sets the start
action and auto_stop_in
environment. It can never fail as it never actually performs any action that can exit with a non-zero exit code. The review_*
job does the work of actually deploying the environment. It can pass or fail and the stop
action will kick off after the auto_stop_in
period expires which then cleans up the cluster. This fix was added in !3453 (merged).
The above "fix" does not solve the entire problem, however, as when a review_*
job is retried, the auto_stop_in
timer is inactivated. There has been a potential fix for the original problem (auto_stop_in
no activated if job fails) at gitlab-org/gitlab#382549 (closed). That feature is currently in beta but can be enabled for individual projects on request. It will be fully available in %17.0.
We should test new auto_stop_in
behavior in a test project to see if it fits our needs and if so, implement it in our charts pipelines so deployments are always cleaned after the auto_stop_in
period regardless of success, failure, or retries.