Skip generating a Git bundle if an up to date bundle already exists in the previous backup

Problem to solve

When using sudo gitlab-backup create, backing up Git repositories is very slow. One reason for this is that all repositories are backed up, even if they haven't changed since the last backup.

Further details

On very large GitLab instances, it is likely that most projects are not updated every hour. By skipping repositories that have not changed, this allows more frequent backups without the high compute cost of backing up all the changes.

Proposal

As an administrator, I will be able to run sudo gitlab-backup create and provide a path to a previous backup (e.g. PREVIOUS_BACKUP=path/to/last/backup).

If a path to a previous backup is provided, if the repository checksum on the sever and previous backup:

  • same checksum: reuse the Git bundle from the previous backup
  • different checksum: generate a new Git bundle for the backup

This is a performance optimization by skipping unnecessary bundle creation. This is not a storage optimization, and does not change the storage format.

Checksum comparison

The checksum of a repo bundle must be compared to the current repository checksum. Possible approaches include:

  • store the checksum with the backup in the filename
  • store the checksum with the backup in a manifest file of some sort
  • generate the checksum from the bundle on the fly
  • store the checksum in the database

When evaluating these approaches we should consider:

  • time efficiency to restore
  • time efficiency to backup
  • use standard Git bundle output (using a custom bundle format should be avoided)

Support for SKIP=tar

The default backup behavior is to generate a single large tarball.

It is probably reasonable to only support SKIP=tar, to avoid the need to untar the previous backup to read checksums and extract Git bundles of unchanged repositories.

Links / references

https://docs.gitlab.com/ee/raketasks/backup_restore.html#skipping-tar-creation

Edited by James Ramsay (ex-GitLab)