objectpool: Fix conflicting references when fetching into pools
In order to update pool repositories we use the FetchIntoObjectPool RPC. This RPC first fetches from the primary pool member into the pool, then rescues any objects that have become dangling because of the reference updates, and finally it kicks off repository maintenance for the pool.
We have seen multiple cases now though where we consistently failed to update pool repositories with the following error:
error: cannot lock ref 'refs/remotes/origin/heads/branch/conflict': 'refs/remotes/origin/heads/branch' exists; cannot create 'refs/remotes/origin/heads/branch/conflict'
So there is a reference refs/remotes/origin/heads/branch
that exists
in the pool repository which obstructs fetching of the the conflicting
reference refs/heads/branch/conflict
in the pool member. The root
cause is that we don't ever prune references in pools even when they
have been removed on the remote side, but in fact this condition can
even trigger in case we would execute git fetch --prune
because we
also use the --atomic
flag, which doesn't cope well with a conflicting
reference being deleted at the same time as the new reference is added.
While it sounds a bit scary, pruning references in object pools should be totally fine because of our dangling-references mechanism: after we have fetched changes from the remote, we check whether there were any force-updates that have led to objects becoming unreachable. Because some other pool members might still refer those objects we must make sure that those aren't deleted, and so we keep dangling references to keep those objects alive. So when we start to prune references now we would recover these objects via such dangling references exactly the same as we do with force-updated references right now.
That still leaves the issue of using --atomic
and --prune
together,
which doesn't work. We can't get rid of --atomic
because it's an
important optimization so we don't execute reference-transaction hooks
for every changed reference twice. We can make this a two-step process
though by first executing git remote prune
to prune deleted branches
without fetching any objects, and only then fetching any new references.
But this again has similiar ramifications because the command doesn't
support --atomic
and may thus perform really slow when many references
are deleted at the same point in time. So we instead use the dry-run
mode, parse its output, and then use git-update-ref(1) to perform the
change with manual voting.
All of this is not exactly ideal or elegant, but it works to fix the
original bug as demonstrated by our tests. We should ultimately try to
upstream patches to either make --atomic
and --prune
work nicely
together, or to add a --atomic
flag to git-remote(1).
Fixes #4373 (closed).