Skip to content

receive-pack: use advertised reference tips to inform connectivity check

Patrick Steinhardt requested to merge pks-connectivity-check-hide-refs into main

When serving a push, git-receive-pack(1) needs to verify that the packfile sent by the client contains all objects that are required by the updated references. This connectivity check works by marking all preexisting references as uninteresting and using the new reference tips as starting point for a graph walk.

This strategy has the major downside that it will not require any object to be sent by the client that is reachable by any of the repositories' references. While that sounds like it would be indeed what we are after with the connectivity check, it is arguably not. The administrator that manages the server-side Git repository may have configured certain refs to be hidden during the reference advertisement via transfer.hideRefs or receivepack.hideRefs. Whatever the reason, the result is that the client shouldn't expect that any of those hidden references exists on the remote side, and neither should they assume any of the pointed-to objects to exist except if referenced by any visible reference. But because we treat all local refs as uninteresting in the connectivity check, a client is free to send a packfile that references objects that are only reachable via a hidden reference on the server-side, and we will gladly accept it.

Besides the correctness issue there is also a performance issue. Git forges tend to do internal bookkeeping to keep alive sets of objects for internal use or make them easy to find via certain references. These references are typically hidden away from the user so that they are neither advertised nor writeable. At GitLab, we have one particular repository that contains a total of 7 million references, of which 6.8 million are indeed internal references. With the current connectivity check we are forced to load all these references in order to mark them as uninteresting, and this alone takes around 15 seconds to compute.

We can fix both of these issues by changing the logic for stateful invocations of git-receive-pack(1) where the reference advertisement and packfile negotiation are served by the same process. Instead of marking all preexisting references as unreachable, we will only mark those that we have announced to the client.

Besides the stated fix to correctness this also provides a huge boost to performance in the repository mentioned above. Pushing a new commit into this repo with transfer.hideRefs set up to hide 6.8 million of 7 refs as it is configured in Gitaly leads to an almost 7.5-fold speedup:

Benchmark 1: main
  Time (mean ± σ):     29.902 s ±  0.105 s    [User: 29.176 s, System: 1.052 s]
  Range (min … max):   29.781 s … 29.969 s    3 runs

Benchmark 2: pks-connectivity-check-hide-refs
  Time (mean ± σ):      4.033 s ±  0.088 s    [User: 4.071 s, System: 0.374 s]
  Range (min … max):    3.953 s …  4.128 s    3 runs

Summary
  'pks-connectivity-check-hide-refs' ran
    7.42 ± 0.16 times faster than 'main'

Unfortunately, this change comes with a performance hit when refs are not hidden. Executed in the same repository:

Benchmark 1: main
  Time (mean ± σ):     45.780 s ±  0.507 s    [User: 46.908 s, System: 4.838 s]
  Range (min … max):   45.453 s … 46.364 s    3 runs

Benchmark 2: pks-connectivity-check-hide-refs
  Time (mean ± σ):     49.886 s ±  0.282 s    [User: 51.168 s, System: 5.015 s]
  Range (min … max):   49.589 s … 50.149 s    3 runs

Summary
  'main' ran
    1.09 ± 0.01 times faster than 'pks-connectivity-check-hide-refs'

This is probably caused by the overhead of reachable tips being passed in via git-rev-list(1)'s standard input, which seems to be slower than reading the references from disk.

It is debatable what to do about this. If this were only about improving performance then it would be trivial to make the new logic depend on whether or not transfer.hideRefs has been configured in the repo. But as explained this is also about correctness, even though this can be considered an edge case. Furthermore, this slowdown is really only noticeable in outliers like the above repository with an unreasonable amount of refs. The same benchmark in linux-stable.git with about 4500 references shows no measurable difference:

Benchmark 1: main
  Time (mean ± σ):     375.4 ms ±  25.4 ms    [User: 312.2 ms, System: 155.7 ms]
  Range (min … max):   324.2 ms … 492.9 ms    50 runs

Benchmark 2: pks-connectivity-check-hide-refs
  Time (mean ± σ):     374.9 ms ±  36.9 ms    [User: 311.6 ms, System: 158.2 ms]
  Range (min … max):   319.2 ms … 583.1 ms    50 runs

Summary
  'pks-connectivity-check-hide-refs' ran
    1.00 ± 0.12 times faster than 'main'

Let's keep this as-is for the time being and accept the performance hit. It is arguably extremely noticeable to a user if a push now performs 7.5 times faster than before, but a lot less so in case an already-slow push becomes about 10% slower.

Signed-off-by: Patrick Steinhardt ps@pks.im

Edited by John Cai

Merge request reports