Skip to content

repository: Drop size calculations via git-rev-list(1)

Patrick Steinhardt requested to merge pks-repository-size-drop-rev-list into master

Almost exactly a year ago, we introduced repository size calculations via git-rev-list(1) in 0d28358d (repository: Use Size() to calculate repo size behind feature flag, 2022-03-24). We had three different goals with this new way of calculating the repository size:

- Be able to take into account reachability of objects.

- Only account objects that exist in multiple packfiles once.

- Exclude objects only reachable via certain set of references, e.g.
  to ignore objects only held alive via internal references like
  `refs/keep-around/`.

And while the new implementation could've solved these usecases, we have never been able to roll it out due to it simply being too slow.

Recently we've reopened the discussion around what we want to do with repository size calculations. As part of that discussion we started to realize that what we have been doing over the last few months with the various different imagined ways to implement repository sizes is to encode policy into the RepositorySize() RPC instead of providing the mechanism. We didn't want to tell the client the actual repository size, but instead a value that already takes policy into account. This is not good design as Gitaly should be as policy-free as possible.

We have thus taken a step back and decided on a new direction: instead of providing a RepositorySize() RPC that returns a policy-ridden size, we want to implement two separate RPCs that empower the client to decide what the policy should actually be:

- One RPC will return detailed information about sizes of various
  different data structures as well as a "summarized" set of values
  that categorize these data structures into classes.

- A second RPC will allow clients to calculate the size of objects
  reachable via a call to git-rev-list(1) with a set of revisions.

Equipped with these RPCs, the client can then iterate on the policy with ease without us having to change the way that those RPCs work on the Gitaly side every time the policy changes.

Long story short: we are dropping the experiment to calculate the repository size via git-rev-list(1) in favor of the new approach.

Part of #5002 (closed).

Closes #4448 (closed).

Closes #4317 (closed).

Edited by Patrick Steinhardt

Merge request reports