Provide a way to restore deleted projects

Problem to solve

There is currently no easy way to recover from deletion of a project or group. It is very difficult and time consuming for the infrastructure team to perform a restore of individual projects or groups, since the backup and disaster recovery strategy is geared toward recovering from whole system issues.

The current process for this is detailed in the runbook: https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/uncategorized/deleted-project-restore.md

Intended users

Primarily this will benefit administrators of gitlab.com - but also any administrators of any instance who might be called upon to restore projects which were deleted.

Impact

Restoring deleted projects takes the production team several days of work across multiple engineers, and there is no guarantee that all data can be recovered. There are several situations where we may need to do restores that go beyond a user accidentally deleting data.

As of 2023, we are averaging 2-3 restores per month, requested by customers, the security team, and legal. Implementing this recommendation would make those requests trivial, and in cases where the restore target is a temporary forensic instance rather than gitlab.com, restores may not need to involve the infrastructure team at all.

Here are some examples of needing to do this

This issue goes into more depth: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16668#note_1582975649

Proposal

The proposed solution to this problem is to use the existing export functionality to force an export before deletion. This export will be saved somewhere (i.e. cloud object storage) for a time, and can be deleted later by a retention policy or lifecycle rule outside of the application.

This previous suggestion was implemented, and it addressed the case where a user accidentally deleted a project. However, there are still other cases where a restore is necessary.

If this is going to happen more and more, we need a method that just marks it for deletion and then we have a reap process that is X days out that culls it. So that "near incident" things can be recovered.

Permissions and Security

No changes to the application's permission model should be necessary. People who can delete can still delete.

The gitlab instance will need permission to write to the storage used for the exports. Access to that storage pool can either be managed independently, or an interface can be built into the product for browsing and restoring.

What does success look like, and how can we measure that?

Currently it is about 3 days of work for the GitLab.com infrastructure team to restore a project with all of its issues, MR's and files. If this can be reduced to a few hours or less, then this change is successful.

See Also:

Edited by Alex Hanselka