Skip to content

git/stats: Expose information about multi-pack-indices

Patrick Steinhardt requested to merge pks-git-stats-expose-midx-info into master

In order to decide whether we need to update the multi-pack-index or not we'll need to have information about whether it already references all packfiles in the repository or not. If it does, then there is no need to update it and thus we may be able to executing git-repack(1) completely.

Right now we're not in a good position to decide though whether it covers all packfiles already. There are multiple different ways we could go about it:

- We might use modification times of the multi-pack-index and check
  whether there is any packfile that is newer. If so, it cannot be
  up-to-date.

- We can implement full parser for the multi-pack-index so that we
  can exactly tell which files are referenced and which aren't.

- We can parse the number of packfiles that the multi-pack-index
  references. If it is smaller than the actual number of packfiles
  then it cannot be up-to-date.

From these options, the last once seems to be the best compromise. It gives us interesting information, like how many packfiles have been written since the last time we have updated the multi-pack-index, which can be useful in the context of repository statistics. Furthermore, it can be computed at constant time without a bunch of complexity given that the multi-pack-index header is fixed-size.

Implement a parser for multi-pack-index headers, which will be used to expose more information about multi-pack-indices via our statistics and as input to our future geometric repacking strategy.

Part of #4998 (closed).

Merge request reports