localrepo: Speed up calculating size for repo with excluded alternates
When calculating a repository's size, we optionally allow the caller to
exclude the size of any object pools the repository is connected to.
This causes us to add --not --alternate-refs
to the git-rev-list(1)
command, which will thus exclude all objects from disk usage calculation
that are reachable by the alternate.
As it turns out though, we're hitting a performance edge case: we ask
git-rev-list(1) to use bitmaps to calculate the size, but in the case of
a pooled repository only the object pool itself will have a bitmap. This
means that by definition, the bitmap can only contain objects that we
wish to exclude from the disk calculations anyway. All objects that are
not reachable by the pool are thus known to not be contained in any
bitmap. Because of this using bitmaps is extremely inefficient as shown
by the following benchmark, which is performed in gitlab-org/gitlab
:
Benchmark 1: git rev-list --all --objects --disk-usage
Time (mean ± σ): 13.290 s ± 0.085 s [User: 13.023 s, System: 0.255 s]
Range (min … max): 13.160 s … 13.355 s 5 runs
Benchmark 2: git rev-list --all --objects --disk-usage --use-bitmap-index
Time (mean ± σ): 3.588 s ± 0.016 s [User: 3.326 s, System: 0.259 s]
Range (min … max): 3.576 s … 3.616 s 5 runs
Benchmark 3: git rev-list --not --alternate-refs --not --all --objects --disk-usage
Time (mean ± σ): 6.828 s ± 0.056 s [User: 6.601 s, System: 0.363 s]
Range (min … max): 6.761 s … 6.897 s 5 runs
Benchmark 4: git rev-list --not --alternate-refs --not --all --objects --disk-usage --use-bitmap-index
Time (mean ± σ): 68.105 s ± 0.383 s [User: 67.471 s, System: 0.744 s]
Range (min … max): 67.663 s … 68.509 s 5 runs
Summary
'git rev-list --all --objects --disk-usage --use-bitmap-index' ran
1.90 ± 0.02 times faster than 'git rev-list --not --alternate-refs --not --all --objects --disk-usage'
3.70 ± 0.03 times faster than 'git rev-list --all --objects --disk-usage'
18.98 ± 0.14 times faster than 'git rev-list --not --alternate-refs --not --all --objects --disk-usage --use-bitmap-index'
As you can see in benchmark #1 (closed) and #2 (closed), bitmaps speed up disk usage
calculations when not using alternate references. But the use of bitmaps
severely degrades performance by almost a factor of 10 as soon as we use
them in combination with --alternate-refs
as shown in #4 (closed). On the other
hand, when we disable the use of bitmaps with alternate refs we are only
about twice as slow as compared to not iterating over alternate refs.
Interestingly, we never hit this issue in production until recently.
This is because of a configuration issue we have had in production: we
unconditionally set core.alternateRefsCommand=exit 0 #
, which causes
us to skip over any alternate refs even when explicitly asking for them
via --alternate-refs
. This is definitely unintentional as it causes us
to not honor the case where the client asks for shared objects to be
excluded from the size calculations. With a recent change though we
fixed this issue and started to correctly iterate over alterante refs
again, but that resulted in a 20-fold increase in latency for the
RepositorySize()
RPC. So we're currently living in a world where
RepostiorySize()
is either broken, or where it has significant issues
with performance.
Mitigate the performance hit by not using bitmaps when the client asks tor alternate references to be excluded only in case the repository has an object pool. As shown by the benchmark, this should result in a 10x speedup compared to using bitmaps for repositories with many refs.
Related to gitlab-com/gl-infra/production#7284 (closed).