Unify Backups into a single CLI

Context

We have different ways of running the backups, they differ due to the installation methods and concerns of each one.:

When running in GDK as a development instance, you rely on a rake task.
When running in Omnibus, there is a wrapper script which can be invoked by gitlab-backup
When running in Kubernetes / CNG, there is a different wrapper script that can be invoked by backup-utility which creates a backup in a different format that is not 100% compatible to the one created in Omnibus/GDK

You can see the documentation and link to the wrapper scripts below:

Proposal

Implement a CLI in the gitlab-rails repository that handles the use-cases for GDK, Omnibus and CNG/Kubernetes using the same utility

There is a prototype of how it could look like here: !118646 (closed).

A unification would initially replicate the same behavior. So we would have different "drivers/adapters" to handle object storage which can delegate to the different commands.

By unifying the experiences in a single location, opens it up for future improvements in how backups are created / handled without having to duplicate the effort.

To be able to ship this in smaller iterations we could first handle omnibus usecase, which should be very similar to the prototype, and that would be a drop-in replacement already. For CNG, we can slowly "eat" the wrapper by making it use the new tool and replace each one of its quirk, rewriting the wrapper to use the new tool flags/API instead. When we have feature parity, we can ship both along side, deprecate the backup-utility and remove it later on in a major release.

Existing challenges

In CNG/Kuberntes there is only object storage. Also due to the scale of the instances that runs there, it was preferred to rely on cloud provider utilities to handle interaction with the different object storage providers. As can be seen in the existing implementation: https://gitlab.com/gitlab-org/build/CNG/-/blob/master/gitlab-toolbox/scripts/bin/backup-utility#L75-85 it delegates transfers to azcopy, gsutil, s3cmd.

Having it in the same codebase means we test it at origin and ship the same experience no matter how we decide to install/run

Another difference in the CNG implementation is in he final "backup" file. This is what I have in a backup created using the rake task:

:db_version: '20230420144418'
:backup_created_at: !ruby/object:ActiveSupport::TimeWithZone
  utc: 2023-04-25 14:50:30.128469000 Z
  zone: !ruby/object:ActiveSupport::TimeZone
    name: Etc/UTC
  time: 2023-04-25 14:50:30.128469000 Z
:gitlab_version: 16.0.0-pre
:tar_version: 'bsdtar 3.5.3 - libarchive 3.5.3 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8 '
:installation_type: source
:skipped:
:repositories_storages:
:repositories_paths:

while those parameters don't quite match to how it is generated by the CNG script:

function write_backup_info(){
  cat << EOF > $backups_path/backup_information.yml
:db_version: $(gitlab-rails runner "File.write('/tmp/db_version', ActiveRecord::Migrator.current_version.to_s)" && cat /tmp/db_version)
:backup_created_at: $(date "+%Y-%m-%d %H:%M:%S %z")
:gitlab_version: $(get_version)
:tar_version: $(tar --version | head -n 1)
:installation_type: gitlab-helm-chart
:skipped: $1
EOF
}

Those discrepancies are hard to test for when they live in separate codebases

cc @gitlab-org/geo-team @gitlab-org/distribution

Edited Apr 25, 2023 by Gabriel Mazetto