WIP sketch bundle snapshots
2019-05-29: video demo at https://youtu.be/3eTJH7hPJqc
This is a sketch of a custom git bundle format that could be used for a linear sequence of repo snapshots. We could use a system like this to make incremental backups of individual Git repositories. Because each backup is a single blob this should be a good fit for object storage. This could also be a part of a replication mechanism where you'd use an object storage bucket as a sort of replication log. It's unclear how fast that would be, i.e. it may or may not be suitable for a Gitaly HA system.
This is just a Git primitive, a full backup solution would need some sort of coordinator that makes the backups. It's a possible puzzle piece for an undetermined larger puzzle.
After a restore you probably want to immediately run a git repack -ad
to reduce the number of packfiles; otherwise your repo is not in an optimal state for active use (lots of packfiles).
It would be possible to periodically compact bundle files by doing a restore up to a certain point in time, and then do a full snapshot. This would reduce the number of packfiles in object storage and evict unreachable objects.
If your repository has a very large number of refs this backup approach will progressively get slower because each bundle includes a full ref dump. FWIW, this is no worse than how the v0 Git transport protocol works. You can think of these bundles as git pushes.
Create:
# full backup
gitaly-bundle-create < /dev/null > snapshot-0.gitaly-bundle
# incremental
gitaly-bundle-create < snapshot-0.gitaly-bundle > snapshot-1.gitaly-bundle
gitaly-bundle-create < snapshot-1.gitaly-bundle > snapshot-2.gitaly-bundle
# ...
Restore:
git init --bare repo.git
cd repo.git
gitaly-bundle-apply < /foo/snapshot-0.gitaly-bundle
gitaly-bundle-apply < /foo/snapshot-1.gitaly-bundle
# ...
git repack -ad