Skip to content

repository: Fix indeterministic voting when creating new repos

When creating repositories we use transactional voting to determine that the repositories have been created the same on all nodes part of the transaction. This voting happens after we have seeded the repository, and the vote is computed by walking through the repository's directory and hashing all its files. We need to be careful though to skip files which we know to be indeterministic:

- FETCH_HEAD may contain URLs which are different for each of the
  nodes.

- Object packfiles contained in the object database are not
  deterministic, mostly because it may use multiple threads to
  compute deltas.

Luckily, we do not have to rely on either of both types of files in order to ensure that the user-visible state of the repository is the same, so we can indeed just skip them.

While we already have the logic to skip these files, this logic didn't work alright because we embarassingly forgot to actually return fs.SkipDir in case we see the object directory. So even though we thought we skipped these files, in reality we didn't.

This bug has been manifesting in production in form of CreateFork, which regularly fails to reach quorum at random on a subset of nodes. The root cause here is that we use git-clone(1) to seed repository contents of the fork, which triggers exactly the case of indeterministic packfiles noted above. So any successful CreateFork RPC call really only succeeded by pure luck.

Fix this issue by correctly skipping over "object" directories. While at it, fix how we skip over FETCH_HEAD by returning nil: it's a file and not a directory, so it doesn't make much sense to return fs.SkipDir.

Changelog: fixed

Closes #4100 (closed)

Merge request reports