Skip to content

localrepo: Speed up calculating size for repo with excluded alternates

When calculating a repository's size, we optionally allow the caller to exclude the size of any object pools the repository is connected to. This causes us to add --not --alternate-refs to the git-rev-list(1) command, which will thus exclude all objects from disk usage calculation that are reachable by the alternate.

As it turns out though, we're hitting a performance edge case: we ask git-rev-list(1) to use bitmaps to calculate the size, but in the case of a pooled repository only the object pool itself will have a bitmap. This means that by definition, the bitmap can only contain objects that we wish to exclude from the disk calculations anyway. All objects that are not reachable by the pool are thus known to not be contained in any bitmap. Because of this using bitmaps is extremely inefficient as shown by the following benchmark, which is performed in gitlab-org/gitlab:

Benchmark 1: git rev-list --all --objects --disk-usage
  Time (mean ± σ):     13.290 s ±  0.085 s    [User: 13.023 s, System: 0.255 s]
  Range (min … max):   13.160 s … 13.355 s    5 runs

Benchmark 2: git rev-list --all --objects --disk-usage --use-bitmap-index
  Time (mean ± σ):      3.588 s ±  0.016 s    [User: 3.326 s, System: 0.259 s]
  Range (min … max):    3.576 s …  3.616 s    5 runs

Benchmark 3: git rev-list --not --alternate-refs --not --all --objects --disk-usage
  Time (mean ± σ):      6.828 s ±  0.056 s    [User: 6.601 s, System: 0.363 s]
  Range (min … max):    6.761 s …  6.897 s    5 runs

Benchmark 4: git rev-list --not --alternate-refs --not --all --objects --disk-usage --use-bitmap-index
  Time (mean ± σ):     68.105 s ±  0.383 s    [User: 67.471 s, System: 0.744 s]
  Range (min … max):   67.663 s … 68.509 s    5 runs

Summary
  'git rev-list --all --objects --disk-usage --use-bitmap-index' ran
    1.90 ± 0.02 times faster than 'git rev-list --not --alternate-refs --not --all --objects --disk-usage'
    3.70 ± 0.03 times faster than 'git rev-list --all --objects --disk-usage'
   18.98 ± 0.14 times faster than 'git rev-list --not --alternate-refs --not --all --objects --disk-usage --use-bitmap-index'

As you can see in benchmark #1 (closed) and #2 (closed), bitmaps speed up disk usage calculations when not using alternate references. But the use of bitmaps severely degrades performance by almost a factor of 10 as soon as we use them in combination with --alternate-refs as shown in #4 (closed). On the other hand, when we disable the use of bitmaps with alternate refs we are only about twice as slow as compared to not iterating over alternate refs.

Interestingly, we never hit this issue in production until recently. This is because of a configuration issue we have had in production: we unconditionally set core.alternateRefsCommand=exit 0 #, which causes us to skip over any alternate refs even when explicitly asking for them via --alternate-refs. This is definitely unintentional as it causes us to not honor the case where the client asks for shared objects to be excluded from the size calculations. With a recent change though we fixed this issue and started to correctly iterate over alterante refs again, but that resulted in a 20-fold increase in latency for the RepositorySize() RPC. So we're currently living in a world where RepostiorySize() is either broken, or where it has significant issues with performance.

Mitigate the performance hit by not using bitmaps when the client asks tor alternate references to be excluded only in case the repository has an object pool. As shown by the benchmark, this should result in a 10x speedup compared to using bitmaps for repositories with many refs.

Related to gitlab-com/gl-infra/production#7284 (closed).

Merge request reports