SPIKE: Proof of Concept for Incremental Git backups
Problem Statement
When I am operating a large GitLab instance, particularly with multiple Gitaly nodes (sharded or HA), the backup rake task is not usable. I need to have reliable up to date backups that I perform at least as often as I upgrade GitLab.
Proposal
The previous exploration into improving backup speed by using concurrency has largely failed. See #241701 (closed)
We should extract code from backup.rake
that calls gitaly RPCs and writes the files to disk to a new gitaly-backup
client. I expect this executable to receive storages and repository paths over stdin and directly write files to the local filesystem (as backup.rake
does today). This will allow us to use go for concurrency and more easily reuse any git specific logic already in use in gitaly. I expect that gitaly-backup
could be use independently of gitlab-rails to backup individual repositories by command-line.
At the end of this step, backup.rake
should continue to work as before but now using gitaly-backup
and with working concurrency.
Once this part is working, then we'll need to decide how to progress onto incremental backups, but my hope is that we can provide a fairly easy commandline to test this on an individual repository basis.
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.