`Projects::DestroyService` should clean up after itself when it fails
We want to prevent something like https://gitlab.com/gitlab-com/infrastructure/issues/1226 from happening again, where a call inside the Projects::DestroyService
raised an exception and resulted in the service never finishing, and the project being left in the pending_delete
state. This meant that the project was never actually cleaned up from the DB or the disk, and it became impossible to create a new project with the same name as this earlier deleted-but-not-quite project: https://gitlab.com/gitlab-com/support-forum/issues/1601.
We currently have to perform a manual cleanup of these pending_delete
projects to get stuff running again, as seen here: https://gitlab.com/gitlab-com/infrastructure/issues/888#note_23730241
We also plan to run this cleanup once on all customer instances to clean up any past pending_delete
stuck projects, as seen here: https://gitlab.com/gitlab-org/gitlab-ce/issues/20984.
Of course, this cleanup is just a bandaid rather than a real fix, because there's currently nothing stopping this whole thing from happening again, when another call in Projects::DestroyService
starts raising errors and starts leaving projects in the pending_delete
state.
I suggest we:
- Make sure we catch any error that can happen inside
Projects::DestroyService#remove_registry_tags
or really,Projects::DestroyService#execute
- Store a human readable error message in in
projects.delete_error
(like we already do with import and mirror errors that happen inside a worker) - Unmark the project as
pending_delete
- Show that error message to a project master or owner on the project homepage
It will be unexpected that a deleted project wasn't really deleted, but this should only happen in very exceptional cases, makes it a lot easier for us and the customer to understand what happened, and doesn't leave projects in a state that they have to be pulled out of.
I think we should do both https://gitlab.com/gitlab-org/gitlab-ce/issues/20984 and this issue in %9.1, since this is an issue that's been returning constantly over the last few months, that we really need to get a handle on.
/cc @zj @stanhu @mydigitalself