repository: Drop size calculations via git-cat-file(1) (!5511) · Merge requests · GitLab.org / gitaly

Patrick Steinhardt requested to merge pks-repository-size-drop-catfile-calculation into master Mar 15, 2023

We have three different ways to calculate the repository size:

- The plain old way of using du(1) on the repository. This is fast,
  but also counts in repository metadata, unreachable objects and
  duplicate objects.

- The implementation based on git-rev-list(1). It allows to only
  count in objects that are reachable via specific references and is
  overall the most flexible approach in how we calculate the size.
  The downside of this is that it is really slow compared to du(h).

- The implementation based on git-cat-file(1). This has been added
  as a compromise between the other two options so that we can at
  least account for duplicate objects. It's faster than using
  git-rev-list(1), but still slower than du(1).

While the latter two options have been implemented quite a while ago now, we still haven't managed to roll them out due to performance reasons. By now it is clear though that we can either choose between the correct and flexible approach, or the fast and dirty approach. Which means that in the long run, our target architecture is going to be the approach based on git-rev-list(1).

This kind of leaves the third implementation based on git-cat-file(1) on the chopping block. It was implemented after we've seen how slow the git-rev-list(1) based approach is as a middle ground where we can at least count out duplicate objects while being reasonably fast. But it did not deliver on that promise as it still wasn't fast enough without also changing the overall architecture of how repository sizes are calculated. So this approach is disfavoured nowadays.

Remove the git-cat-file(1) based implementation. We are not going to use it anyway.

Changelog: removed

repository: Drop size calculations via git-cat-file(1)

Merge request reports