We should update the Back up and restore large reference architectures documentation with guidance to use server-side backups as the recommended option instead of standing up an Ombnibus VM. The option to standup a Omnibus VM should be presented as a secondary option.
Why
We want to make this change for several reasons:
Server-side backups avoid intermediate storage and processing by pushing the backup directly from the Gitaly nodes to object storage
Server-side backups remove the need to standup an additional Omnibus node that is not not part of the prescribed reference architecture
If you are unsure about the correct group, please do not leave the issue without a group label, and refer to
GitLab's shared responsibility functionality guidelines
for more information on how to triage this kind of issue.
Sampath Ranasinghechanged title from Geo: Update docs for backing up large architectures with instructions for server-side backups to Backups: Update docs for backing up large architectures with instructions for server-side backups
changed title from Geo: Update docs for backing up large architectures with instructions for server-side backups to Backups: Update docs for backing up large architectures with instructions for server-side backups
Looks like the backup-utility needs to be extended to support server-side backups. I create an issue for this #438393 (closed). Thanks @nwestbury for highlighting this.
My initial test of backup/restore with Gitaly server-side incremental backup with a 1k succeeded yesterday. It looked like:
# All GitLab services on one node# Create a repo# Take a full backup. Use Gitaly server-side backup for Git repossudo gitlab-backup create REPOSITORIES_SERVER_SIDE=true# Update the repo's README.md (1st)# Restore the full backupsudo gitlab-ctl stop pumasudo gitlab-ctl stop sidekiqsudo gitlab-backup restore BACKUP=1707364841_2024_02_08_16.9.0-presudo gitlab-ctl restart# Notice that the README.md edit (1st) was reverted as expected# Update the repo's README.md (2nd)# Take an incremental backupsudo gitlab-backup create REPOSITORIES_SERVER_SIDE=true INCREMENTAL=yes PREVIOUS_BACKUP=1707364841_2024_02_08_16.9.0-pre# Update the repo's README.md (3rd)# Restore the incremental backupsudo gitlab-ctl stop puma;sudo gitlab-ctl stop sidekiqsudo gitlab-backup restore BACKUP=1707367012_2024_02_08_16.9.0-presudo gitlab-ctl restart# Notice that the README.md edit (2nd) is present, but (3rd) was reverted as expected
I am currently attempting to reconfigure the environment to use a full-blown Gitaly Cluster (and a separate Postgres node, though not a Patroni cluster). And I'll retry the same kind of tests.
@proglottis I figure that after taking many incremental server-side backups, you need to take a fresh full one.
If you take a full server-side backup, how does it interact with existing backups?
Do you have any suggestions as to when to consider doing so?
Also I just noticed that the existing document I wrote tells sysadmins to run the incremental backup command in a cronjob... but PREVIOUS_BACKUP permanently references the first full backup.
I assume in that case, new increments will be created along side older ones, but each increment will be bigger than the last?
Any ideas for creating new increments on top of the previous increment?
If you take a full server-side backup, how does it interact with existing backups?
@mkozono It doesn't interact at all. New backup files are written with the backup ID as reference. It does not find or read any existing backups.
Do you have any suggestions as to when to consider doing so?
Basically the more increments, the slower the restore will become. The trade-off is time to backup vs time to restore.
Also I just noticed that the existing document I wrote tells sysadmins to run the incremental backup command in a cronjob... but PREVIOUS_BACKUP permanently references the first full backup.
This doesn't really work with server-side. PREVIOUS_BACKUP chooses the backup to extract locally that then has the increment created on top. Problem is with server-side this extracted backup has no repository backups in it.
When you create an increment, gitaly has no idea about PREVIOUS_BACKUP it just finds the latest backup. So for local backups this is the latest backup contained in the PREVIOUS_BACKUP tar since that is what got extracted. For server-side, it will be the latest backup on object-storage since all backups exist in server-side, it's not limited to the extracted tar.
With manifest files, when creating an increment, the latest manifest file is read (it's called +latest.toml). This manifest has the list of bundle files needed to restore. So to create an increment it simply appends a new step and writes this modified manifest to both the new backup ID and to latest.
@proglottis Ah, thank you so much for correcting my assumptions. I will incorporate this into my doc MR.
Regarding PREVIOUS_BACKUP=, you are saying that it actually had no effect in #435298 (comment 1767151306), therefore I can simply remove it, correct me if I'm wrong? And the Gitaly server-side incremental backups will act as expected.
Regarding PREVIOUS_BACKUP=, you are saying that it actually had no effect in #435298 (comment 1767151306), therefore I can simply remove it, correct me if I'm wrong?
PREVIOUS_BACKUP still chooses a tar file for backup.rake to extract. It just doesn't affect server-side. I guess since nothing else in the backup does incremental, we could remove this requirement eh?