Proposal: Use sync/rsync commands to restore backups

Proposal

During backup, the corresponding sync/rsync method is used depending on the storage provider (s3cmd sync, aws s3 sync or gsutil rsync), this enables the usage of features provided on each provider/CLI to speed up/guarantee the process.

s3cmd

  Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)
      s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR or s3://BUCKET[/PREFIX] s3://BUCKET[/PREFIX]

aws

  sync
<LocalPath> <S3Uri> or <S3Uri> <LocalPath> or <S3Uri> <S3Uri>

gsutil

The gsutil rsync command makes the contents under dst_url the same as the
  contents under src_url, by copying any missing files/objects (or those whose
  data has changed), and (if the -d option is specified) deleting any extra
  files/objects. src_url must specify a directory, bucket, or bucket
  subdirectory. For example, to make gs://mybucket/data match the contents of
  the local directory "data" you could do:

    gsutil rsync -d data gs://mybucket/data

  To recurse into directories use the -r option:

    gsutil rsync -d -r data gs://mybucket/data

  To copy only new/changed files without deleting extra files from
  gs://mybucket/data leave off the -d option:

    gsutil rsync -r data gs://mybucket/data

  If you have a large number of objects to synchronize you might want to use the
  gsutil -m option, to perform parallel (multi-threaded/multi-processing)
  synchronization:

    gsutil -m rsync -d -r data gs://mybucket/data

This same process can be used to restore a backup, once the bucket is backup in tmp-backups, instead of invoking cleanup, we can directly call restore_from_backup, but instead of looping for every single file, rely on sync/rsync and upload the missing/changed files, or remove the non-existing ones.

This might work in both scenarios where we are restoring to a freshly installed instance, or recovering a previous backup on an existing instance.

Edited Jun 01, 2022 by Ferran Vidal