Skip to content

CI: auto_stop_in for job retries

Summary

The auto_stop_in keyword in a job normally runs the environment stop action after the time expires. However, the auto_stop_in setting is ignored if the job running the start action fails. We worked around this issue somewhat in our charts pipelines by separating the creation of an environment into two different jobs -- create_review_* and review_*. The create_review job just creates the environment and sets the start action and auto_stop_in environment. It can never fail as it never actually performs any action that can exit with a non-zero exit code. The review_* job does the work of actually deploying the environment. It can pass or fail and the stop action will kick off after the auto_stop_in period expires which then cleans up the cluster. This fix was added in !3453 (merged).

The above "fix" does not solve the entire problem, however, as when a review_* job is retried, the auto_stop_in timer is inactivated. There has been a potential fix for the original problem (auto_stop_in no activated if job fails) at gitlab-org/gitlab#382549 (closed). That feature is currently in beta but can be enabled for individual projects on request. It will be fully available in %17.0.

We should test new auto_stop_in behavior in a test project to see if it fits our needs and if so, implement it in our charts pipelines so deployments are always cleaned after the auto_stop_in period regardless of success, failure, or retries.