Skip to content

Performance: repository repacking should be a side effect of replication

Problem to solve

It is important that replicas are properly repacked so that they exhibit proper performance if a failover occurs. If not, performance could be terrible until sufficient writes are received.

Proposal

Because replication now in Gitaly HA uses git fetch it is pointless and impossible to "replicate" a GarbageCollect or RepackXxx RPC call: these calls make changes that are invisible to git fetch.

However, it is important that we keep replicas in a good packed state because this can make a big difference for performance.

I propose we:

  • create heuristics that decide when to repack #2054 (closed)
  • each time we replicate a repository, we run the heuristic on the destination, and repack if needed
Edited by James Ramsay (ex-GitLab)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information