Improve how we deal with projects that are pending_delete but job failed for some reason

Problem to solve

We have a documented procedure in : https://gitlab.com/gitlab-org/gitlab/blob/a67ad6249dc784f328ce23d77bd7ae1e8ebe57b5/doc/administration/troubleshooting/gitlab_rails_cheat_sheet.md#L193-193

The main problem here is that there is no visibility on the problem. This used not to go unnoticed with the legacy storage as we would be creating a name conflict on disk, which could lead to people pinging support. With Hashed Storage, this is no longer the case.

During the Hashed Storage migration in https://gitlab.com/gitlab-com/gl-infra/production/issues/935 we found that there are still 163 projects marked as pending_delete that have failed removal and forget.

Intended users

Internal persona

Further details

Proposal

We can do a few things here.

Expose projects that have their removal in a stale state (we could check check a combination of both pending_delete and updated_at with a defined threshold) in a rake task.
Create a cronjob to retry the stale ones from time to time
Add the stale removal to the system_checks. It should fail a check when there are projects pending_delete that are passed our threshold

Permissions and Security

System access (terminal)

Documentation

Availability & Testing

What does success look like, and how can we measure that?

We have no project pending_delete that are stale (and has no bug preventing it to be removed)
We have system checks telling us that a project should have been deleted already but its not

What is the type of buyer?

Links / references

Edited Jun 30, 2025 by 🤖 GitLab Bot 🤖