Skip to content

Improve GitLab Geo backfill so that it can be managed properly at scale

This relates to the GitLab Geo Disaster recovery issue in #846 (closed). gitlab-org/gitlab-ee#1190 and !861 (merged) addressed the basic need for some way to bootstrap a GitLab Geo (#76 (closed)) to have all the copies, but it only works for instances with a few repositories.

What we need now is a way for this to work better at scale (e.g. GitLab.com):

  1. The progress needs to be monitored (e.g. bookkeeping for which repositories have been successfully pushed and when, filesystem usage, network bandwidth usage, etc.)
  2. Currently all the repositories are scheduled at once. We need a way to stagger this over time so we don't consume all filesystem and network bandwidth.
  3. Ability to turn this on and off
  4. Throttle at specific times/load/etc.

We should also deal with replication of LFS objects. I think calling git lfs fetch --all might do this.

Thoughts @patricio, @brodock, @jacobvosmaer-gitlab, @regisF, @pcarranza?