Skip to content

Server-side backup metrics

Add prometheus metrics to keep track of server-side backups.

Proposed metrics

The table below outlines some metrics we can consider adding. These were tested locally via the GDK. For each metric, we can also track gl_project_path as a label attribute to identify particularly large or troublesome repositories.

Metric Example Notes
Backup duration by phase

image.png

A rolling average rate of each phase of a backup. Backups have four phases:

  • writing refs
  • writing the bundle
  • writing custom hooks
  • committing the manifest

BackupRepository RPC response codes

image.png

Rate of RPC responses grouped by response code. BackupRepository emits the following codes:

  • OK
  • NotFound (for skipped backups)
  • Internal (for errors)

BackupRepository RPC response time

image.png

A rolling average rate of response time for the RPC, which pretty much translates to the actual time taken to perform a backup of a single repository.
Bundle upload rate

image.png

Upload rate in MB/s of bundle files into object storage.
Bundle uploads by size

image.png

Persistent count of bundles uploaded by size. Each row represents the number of bundles uploaded with a size within that bucket. e.g. 186 bundles <10MB were uploaded.

Not sure how useful this graph will be in practice.

Implementation plan

Edited by James Liu
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information