Skip to content

WIP sketch bundle snapshots

Jacob Vosmaer requested to merge jv-sketch-snapshot into master

2019-05-29: video demo at https://youtu.be/3eTJH7hPJqc


This is a sketch of a custom git bundle format that could be used for a linear sequence of repo snapshots. We could use a system like this to make incremental backups of individual Git repositories. Because each backup is a single blob this should be a good fit for object storage. This could also be a part of a replication mechanism where you'd use an object storage bucket as a sort of replication log. It's unclear how fast that would be, i.e. it may or may not be suitable for a Gitaly HA system.

This is just a Git primitive, a full backup solution would need some sort of coordinator that makes the backups. It's a possible puzzle piece for an undetermined larger puzzle.

After a restore you probably want to immediately run a git repack -ad to reduce the number of packfiles; otherwise your repo is not in an optimal state for active use (lots of packfiles).

It would be possible to periodically compact bundle files by doing a restore up to a certain point in time, and then do a full snapshot. This would reduce the number of packfiles in object storage and evict unreachable objects.

If your repository has a very large number of refs this backup approach will progressively get slower because each bundle includes a full ref dump. FWIW, this is no worse than how the v0 Git transport protocol works. You can think of these bundles as git pushes.

Create:

# full backup
gitaly-bundle-create < /dev/null > snapshot-0.gitaly-bundle

# incremental
gitaly-bundle-create < snapshot-0.gitaly-bundle > snapshot-1.gitaly-bundle
gitaly-bundle-create < snapshot-1.gitaly-bundle > snapshot-2.gitaly-bundle
# ...

Restore:

git init --bare repo.git
cd repo.git
gitaly-bundle-apply < /foo/snapshot-0.gitaly-bundle
gitaly-bundle-apply < /foo/snapshot-1.gitaly-bundle
# ...
git repack -ad
Edited by Jacob Vosmaer

Merge request reports