Recover stuck stopping environments (!132300) · Merge requests · GitLab.org / GitLab

Hunter Stewart requested to merge hustewart-recover-failed-stopping into master Sep 20, 2023

What does this MR do and why?

See Recover and revert stuck-stopping environments (#425161 - closed) for more context

When jobs that stop environments fail, the environment can get stuck in a state of "stopping." We want those environments to recover to a state of "available."

This MR addresses that concern with the following:

adds a new state event to environments to represent going from "stopping" back to "available"
adds a new worker to fire the state event given the proper conditions
makes deployables enqueue that new worker when they fail
adds changes from worker related rake tasks related to adding a new worker
adds specs
updates environments spec factory

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before	After

How to set up and validate locally

#363197 (closed) provides the steps to verify the behavior.

I recommend to run through it on master to see what happens currently.

After that, switch to this MR's branch and run through it again, noting the following differences.

on master

the relevant Environment be stuck in stopping (you can check in rails console)
stop job will show up as requiring manual action when you get to the end of the steps

on this branch

the relevant Environment will be in a state of available
the stop job will run without manual action required.
you can tail the background jobs gdk tail rails-background-jobs | grep StopJobFailedWorker and look for the processing of the job

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Sep 27, 2023 by Hunter Stewart

Recover stuck stopping environments

What does this MR do and why?

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports