Skip to content

Add ability to skip tar creation to backup-utility

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Proposal

Currently the backup-utility does not provide the user a way to skip the tar process when creating/restoring backups.

This feature is desired as the tar process was identified as a culprit to pods that were being evicted during the backup process due to:

Message: The node was low on resource: memory

Such a scenario was observed with a user with a very big GitLab instance.

Previous spike issue

gitlab-org/distribution/team-tasks#1182 (comment 1333407160)

Techinical difficulties

We have 2 separate tools for the same job

Right now we have two backup/restore tools:

  • backup-utility: used by the GitLab chart to trigger schedules backups via cron jobs, or manual backups via the toolbox pod.
  • GitLab rake tasks: used by Omnibus/self-compiled instances to trigger backups.

Both of them have their own code to deal with backup/restore. For example, both have different code snippets to deal with tar, so some code solutions might required duplicated work to implement it on both places.

In some cases, the backup-utility triggers the existing rake tasks. For instance, when it does the database and repository backups. Ideally, we'd like the backup-utility to delegate all it's features to GitLab rake tasks. The work to implement this code/feature-parity unification is being tracked at: gitlab-org/charts/gitlab#1127

Untarred files are currently not identified

Simply supporting SKIP=tar does not work because the untarred version of the backup is not identified with a backup id, so when pushing an Object Storage we'd be overriding the previous backup, which is not desired.

Ideas

  1. Create untarred backups in named directories (#362981 - closed) aims to solve the unidentifiable untarred backup name. This is a good starting point, as it would make the top-level untarred folder to be identifiable and also storable in an object storage. Still, further work needs to be done to support pushing/downloading the folder directly. Also, the issue does not cover skipping tar for all of the GitLab components (artifacts, uploads, builds, etc), which we should also look into. Finally, this work we'll have to be done for the backup-utility and for the GitLab rake tasks, if we don't work in unifying the backup-utility and rake tasks code logic and feature-parity.
  2. Alternatively instead of duplicating it, we could just support using the rake task for users who don't need the object storage. Still, the backup-utility will have to support storing the backups somewhere, and it can't be a pod. So maybe this will required having a specific persistent volume for this purpose.
Edited by 🤖 GitLab Bot 🤖