git clone fails due to "curl 18 transfer closed with outstanding read data remaining"
ZD: https://gitlab.zendesk.com/agent/tickets/46318
Related issue: gitlab-org/gitlab-ci-multi-runner#1587
A similar problem happened today on the www-gitlab-com build: https://gitlab.com/gitlab-com/www-gitlab-com/builds/6210908
Observed Behavior
-
When the customer attempt 10 concurrent HTTP clones via Jenkins to a fairly large repo (~2.5 GB), some of them fail with the error message:
error: RPC failed; curl 18 transfer closed with outstanding read data remaining fatal: The remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed
When this clone fails, it looks like it's fairly quick. In the following case below, it looks like about 300 ms into the transfer:
-
When the customer tries 1 clone at a time, things work fine.
-
When the customer tries the same 10-concurrent clones for a small repo, things work fine.
The gitlab-workhorse logs show a "broken pipe" error:
If we imagine the connection path looks like:
client <---> nginx <----> workhorse
The error message suggests workhorse got a broken pipe either from nginx or the client.
The client seems to suggest that it got an unexpected/empty chunk of data.
What we tried
- Changing the
http.postBuffer
size to a large value - Moving the gitlab-workhorse and UNIX sockets to a different partition. We did observe that the root partition disk space rocketed up to 100% when these clones were in progress, which may suggest that running out of disk space may lead to these failures. We still ran into the problem even after moving the UNIX sockets.
I tried cloning a Linux repo multiple times but did not see disk space budge.
/cc: @MrChrisW, @dblessing