Skip to content

ssh: Handle timeout waiting on packfile negotiation with proper errors

Before the introduction of 3c6bc2db (Limit the negotiation phase for certain Gitaly RPCs, 2019-10-15), SSHUploadPack and SSHUploadArchive had been vulnerable to a TOCTOU race: the client could open a session to the server and keep it up almost indefinitely without starting the packfile negotiation yet, so that access checks have already been performed but the client can still ask for an arbitrary set of objects to be sent to it. So even if the client's access got rejected meanwhile, it could still ask for arbitrary objects after the fact by starting the packfile negotiation at whatever point the client wants to.

This TOCTOU race has been plugged by listening in on the negotiation and putting a limit on how long that phase may span. The underlying thought is that it's fine to keep the actual packfile streaming open for an arbitrary amount of time because at that point the objects have already been negotiated anyway.

We have observed some flakiness in this area though in our CI pipelines, where the timeout would sometimes return a codes.Canceled instead of our currently-assumed codes.Internal error. But taking a deeper look it is actually surprising that we'd return codes.Internal in the first place: it's a well-defined error case and we should provide proper information to the client to distinguish the case where we timed out waiting for the packfile negotiation to complete. Instead, we just return an error that is indistinguishable from other errors.

Fix this shortcoming by properly handling the case where we cancel the context ourselves due to the packfile negotiation timing out. We now return a proper error message as well as an error code to the client that makes it easy to see what's happening. Chances are that this will also fix the flakiness in our CI, but it's hard to tell.

Merge request reports