Skip to content

Draft: Add backup list method to Sink

James Fargher requested to merge list_backups into master

Since object storage has very limited options for listing blobs it will be the restricting factor how we fetch a series of incremental backups. I'm hoping we can get away with not having to create an index file but this will largely depend on performance.

This method needs to cover two use-cases:

  1. Find the latest full bundle and refs. We can't just assume we can make a diff from the last incremental, if there is a newer full backup, then the incremental backup should be taken from this instead.
  2. Find the latest or specified series of incremental backups starting with a full backup.

The filenames will need to be devised in a way such that the prefix filtering provided here is efficient and will not cause too many paths to be returned.

Existing naming

When a backup is taken, the existing backup is overwritten if it exists. This naming scheme has to remain as a fallback in order to restore from old backups, however since the names are predictable no call to List is required.

  • <repo relative path>.bundle
  • <repo relative path>.refs

Proposed naming

Full backup:

  • <repo relative path>/full_<timestamp>.bundle
  • <repo relative path>/full_<timestamp>.refs

Incremental backup:

  • <repo relative path>/inc_<full timestamp>_<n>.bundle
  • <repo relative path>/inc_<full timestamp>_<n>.refs

Unlike full backups, to create an incremental backup we always need to find the previous backup in order to get the refs. So in this case it will be simple to have <n> as a zero-padded increasing by 1 integer. This should allow us to use simple string sorting and easily detect gaps.

Use-case 1:

Find the latest full backup.

paths, err := sink.List(ctx, "<repo relative path>/full_")
// handle err
paths = filterExt(paths, ".bundle")
sort.Strings(paths)
lastBackupPath := paths[len(paths)-1]

If we get nothing here then fallback to creating a full backup.

Use-case 2:

Find the series associated with the latest full backup.

// lastBackupPath from use-case 1
stamp := extractStamp(lastBackupPath)
paths, err := sink.List(ctx, fmt.Sprintf("<repo relative path>/inc_%s_", stamp))
// handle err
sort.Strings(paths)
paths = append([]string{lastBackupPath}, paths...)

Blob storage limits

GCP

https://cloud.google.com/storage/quotas#objects

There is no limit to the number of reads for objects in a bucket, which includes reading object data, reading object metadata, and listing objects.

AWS

https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket.

Edited by James Fargher

Merge request reports