Skip to content

Create untarred backups in named directories

Proposal

When creating a backup you can choose to create an untarred backup by using SKIP=tar (https://docs.gitlab.com/ee/raketasks/backup_restore.html#skipping-tar-creation). When using this option you end up with the contents of the tar file left in the root of the backup directory:

$ ls tmp/backups/
artifacts.tar.gz        builds.tar.gz  lfs.tar.gz       pages.tar.gz  terraform_state.tar.gz
backup_information.yml  db             packages.tar.gz  repositories  uploads.tar.gz

There are several issues with this:

  • If you create another untarred backup, it has to overwrite the existing untarred backup. This means, for example, if you skipped artifacts on the second backup run SKIP=tar,artifacts, the artifacts of the previous backup would still be present in the backup directory.
  • It means we have no mechanism to automatically delete old backups (https://docs.gitlab.com/ee/raketasks/backup_restore.html#limit-backup-lifetime-for-local-files-prune-old-backups). This feature relies on being able to extract the backup time from the tar filename.
  • It makes it difficult to support untarred backups on object storage. These files would be continuously overwritten instead of giving a timeseries of backups.

We should instead create untarred backups in a directory with the same name as the tar file we would have created. For example, if we would have created 1653000200_2022_05_20_15.0.0-pre_gitlab_backup.tar then we should put the untarred backup in a directory called 1653000200_2022_05_20_15.0.0-pre_gitlab_backup.

Along with the benefits above, this would dramatically simplify cleaning up failed backups since we could simply remove the entire directory instead of having to guess what was just created https://gitlab.com/gitlab-org/gitlab/-/blob/3c90e18058ed9f547127d277bf7e606704af26c3/lib/backup/manager.rb#L331-340