Discussion : Backup Cli
Context
We are building a new backup solution which will initially be cli driven. We are initially targeting cloud-hosted deployments. All cloud vendors offer backup solutions for the different data components of GitLab - PostgreSQL, Gitaly (storage disks) and Object storage. The cli tool is intended to orchestrate the backup and restore process. The intent was to lean on the vendor's tools where they meet our needs and innovate where they fall short or are not sufficiently flexible for our needs. For example, RDS backups for managed DBs meet our requirement to backup the GitLab PostgreSQL DB.
Another principle we have been following is that the backup solution will not provision infrastructure. This allows the backup cli to operate with limited privileges without the risk of creating untracked/unmanaged infra and more importantly deleting infrastructure.
During recent discussions, questions have been raised as to why we are building our own cli tool instead of using the vendor's tools along with their orchestration solution such AWS backup plan. While all parties see value in having a cli for restore that incorporates WAL for both PostgreSQL and Gitaly the reasons for having a cli for the backup is being debated. This discussion issue is to capture our deliberations on our choice that will help inform if we need to re-evaluate our approach.
Merits of each approach
Below I attempt to summarize the discussions around the merits of each approach for backups.
Merits of having a cli
- Allows us to innovate on the backup. We will be able to embrace solutions such as Gitaly WAL partition archives.
- Advantages of using WAL partition archives
- WAL partition archives allow us to selectively backup and restore individual projects. We want this capability for self-managed customers and we will is valuable for Cells. It may also be useful for GitLab Dedicated.
- WAL partition archives also mean we don't need to provision infrastructure. We
- WAL partition archives will be the recommended way to backup solution for Gitaly with Raft. We expect to deploy Gitaly with Raft on Cells and it will be available to self-managed customers.
- Allows us to extend the backup to data for components not directly offered by the vendors such as Zoekt and ClickHouse. For full transparency, this is hypothetical since we haven't investigated backing up these components yet but the thesis remains that the cli offers us flexibility to innovate and extend.
- Advantages of using WAL partition archives
- Protects against vendor lock-in. It prevents us being limited by the vendor's solutions for components such as DB. For example, if we decide at some point in the future that RDS no longer serves our needs and we want to explore an alternative solution, relying on AWS backup plan will prevent us from doing so. We already have some research initiatives in progress on this front. It will give us the flexibility to deploy our infrastructure on any vendor without needing to re-tool.
- We can have a single solution that works across multiple vendors in a consistent fashion. Troubleshooting is easier as a result. Having bespoke solutions means each implementation will be different and have it's own nuances.
- We can have a single solution that works across self-managed and SaaS.
- A simple cli tool lowers the barrier for taking backups of GitLab in vendor's environment without the customer needing to develop expertise in the vendor's toolchain. One could argue that the customer must have this knowledge but the reality is that there are customers who do not. Therefore, we either have to guide them through the generation and maintenance of our own documentation or create a product that make this easier for them. It is in our interest to make it easier for them.
Merits of using vendor Backup plans
- Provided by the vendor and is proven technology used by hundreds or thousands of their customers.
- We already have this deployed for backing up GitLab Dedicated.
- It is trusted and reduces friction on the compliance front for customers in regulated industries.
- Has well-established monitoring and alerting tooling around it.