Skip to content

Investigate required Configuration for Cloud-based backup

Context

With Portable Backups we rely on the existing configuration information that is retrieved by SourceContext and OmnibusContext. They provide the tool with access to required connection params and credentials, along-side to where blobs are stored.

in Cloud-based backups, we need to integrate with each Cloud vendor instead.

Here we care about the following things:

  • Which managed service stores each specific data-type
  • What type of action we need to perform in each one to preserve data
  • What type of action we need to perform in each one to restore data
  • Store each service-data-reference that is part of a backup session

Proposal

Based on the MVC implementation of Object Storage Backup: #455385 (closed), identify which type of information the tool requires in order to figure out what needs to be backup and to where.

Here are some questions / suggestions to guide the work:

  • How to we identify which Object Storage endpoints we need to backup?
  • How do we link an Object Storage endpoint to a specific data type?
  • What should we do to prevent configuration mistakes?
    • Should we perform some configuration validation step?
    • How can we verify configuration points to the correct data type (ex: a artifacts configuration actually points to artifacts and not something else)
  • In the initial phase we should consider relying on the Object Storage configuration using the Consolidated format: https://docs.gitlab.com/ee/administration/object_storage.html#configure-each-object-type-to-define-its-own-storage-connection-storage-specific-form
    • Do we see any challenge in later on supporting the non-consolidated format?
    • If each blob has its own Object Storage, does the approach from #455385 (closed) support that model? (N:1 where N is the source and 1 is the backup bucket?)
  • What type of credential format do we need to access/perform and restore a backup?
  • How can we validate we have the correct credentials with the correct permissions?
    • Should we build a credential validation command?
    • Should such command logic execute prior to each backup?
    • Should such command logic execute prior to each restore?↵
  • With the gitlab-backup-cli tool being decoupled from the Rails codebase, should we consider storing the required configuration in a specific file for the tool, instead of relying on extracting information from the places where it may already exist?↵
    • Does that approach aid us in integrating with Kubernetes / Helm charts?
Edited by Kyle Yetter