Sporadic failures with chacha20-poly1305@openssh.com
At GitHub, we occasionally see reports of problems Git clone and fetch operations where the server is using libssh (a recent HEAD) that look like this when using chacha20-poly1305@openssh.com:
error: inflate: data stream error (invalid distance too far back)
fatal: pack has bad object at offset 1074398512: inflate returned -3
fatal: protocol error: bad line length character: ¹Ñ>ê
fatal: index-pack failed
That essentially means that the connection is being corrupted somehow, which should not be possible due to the AEAD. The problems appears around the 1 GiB mark, which implies that rekeying may be involved. Further evidence for that is that using aes128-ctr doesn't trigger the problem. That cipher has a 4 GiB rekey limit.
However, we're unable to reproduce this problem, and we've tried the following:
- Fetch and clone large amounts (2-5 GiB of data) with
chacha20-poly1305@openssh.com - Set the rekey limit on the client side (OpenSSH) to an absurdly low amount (1-2 MiB)
We don't believe this is a problem on the Git side, since changing the cipher should not affect things and the repositories appear to be in good health. Also, we see this on a variety of client systems: several Windows systems as well as CentOS 7, all using OpenSSH.
We're wondering if folks have seen this problem before or might have suggestions on how to debug.