Explore the validity of a "safe delete" mechanism

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

This is an issue to capture discussions related to a safe delete feature as well as document design changes and capture the ultimate decision on how/if this should be handled.

Premise

The GitLab .com operational databases each contain dozens of terabytes of data. In an effort to rein this in, we begun the process of evaluating data retention policies. In order to provide some assurance that it's okay to "let go" of data, we should strive to provide an "undelete" feature for our database.

Challenges

The difficulty in providing a safe delete mechanism arises from the confluence of the following points:

  1. Postgres has no "native" capability in its data model to accomplish this. When a tuple is marked as "deleted", it is only retained for a sufficient period to provide visibility for any concurrent transactions that may be running - after this, the tuple is marked as "dead" and the space is later reclaimed for writing new rows.
  2. The operational constraints under which the database operates as well as the complexity of the application make it extremely difficult to consistently implement a safe delete feature at the application level. Indexes must be added to support "deleted" flags which are expensive to maintain and a lot of work to create. As well, queries must be restructured to consider said flags, and it is possible to incur unforeseen performance penalties from said query restructuring. Furthermore, there are 900 tables present in the application, and each table/model would need to implement this functionality in a piecemeal fashion.

to be continued

Edited by 🤖 GitLab Bot 🤖