Skip to content

housekeeping: Keep timestamp of last full repack

Patrick Steinhardt requested to merge pks-houskeeping-full-repack-timestamp into master

At the moment, we can keep track of the last time a full repack has happened in a repository by simply taking the timestamp of the oldest packfile in a repository. This only really works because we have a split between full and incremental repacks.

This is about to change soonish though once we implement support for geometric repacking of repositories. In geometric repacking, we don't control anymore whether old packfiles will get rewritten or not, but instead we shift that burden to Git. So the oldest packfile in the repository may have been written either by a full repack, or by a geometric repack that decided to rewrite the oldest packfile.

Now why do we care for this? The problem is that once we move towards geometric repacking, we still need to make sure that we perform a full repack every once in a while. This full repack will then be responsible for moving unreachable objects into a separate cruft pack so that we can still properly prune objects from repositories. But as we will have no direct control over the number of packfiles in a repository anymore, the current heuristic that uses the number of existing packfiles won't work anymore to decide whether we need to perform a full repack or not.

Instead, we'll be moving to a time-based heuristic where we decide to do a full repack every once in a while, e.g. daily. This will ensure that we perform geometric repacks most of the time but still move unreachable objects out into cruft packs on a schedule that is easy to understand and explain.

As mentioned though, we are not in a position to derive the last time such a full repack has happened. To fix this, introduce a new timestamp file ".gitaly-full-repack-timestamp" that we write into the repository every time we are about to perform a full repack.

Merge request reports