Sporadic failures with chacha20-poly1305@openssh.com

At GitHub, we occasionally see reports of problems Git clone and fetch operations where the server is using libssh (a recent HEAD) that look like this when using chacha20-poly1305@openssh.com:

error: inflate: data stream error (invalid distance too far back)
fatal: pack has bad object at offset 1074398512: inflate returned -3
fatal: protocol error: bad line length character: ¹Ñ>ê
fatal: index-pack failed

That essentially means that the connection is being corrupted somehow, which should not be possible due to the AEAD. The problems appears around the 1 GiB mark, which implies that rekeying may be involved. Further evidence for that is that using aes128-ctr doesn't trigger the problem. That cipher has a 4 GiB rekey limit.

However, we're unable to reproduce this problem, and we've tried the following:

  • Fetch and clone large amounts (2-5 GiB of data) with chacha20-poly1305@openssh.com
  • Set the rekey limit on the client side (OpenSSH) to an absurdly low amount (1-2 MiB)

We don't believe this is a problem on the Git side, since changing the cipher should not affect things and the repositories appear to be in good health. Also, we see this on a variety of client systems: several Windows systems as well as CentOS 7, all using OpenSSH.

We're wondering if folks have seen this problem before or might have suggestions on how to debug.