Workhorse hangs git-upload-pack if input and output payloads are large enough
Workhorse executes git-upload-pack
and attempts to send and receive data from the stdin and stdout pipes, respectively. The problem seems to be that in certain cases, the stdin pipe can fill up, causing git fetch
to freeze indefinitely. The root cause might be because Workhorse does not attempt to read stdout until all input is sent, which ultimately deadlocks both git-upload-pack
and Workhorse. When the client aborts the fetch, a side effect is that git-upload-pack
processes also get left around. These processes are cleaned up only after Workhorse is restarted.
This relates to gitlab-org/gitlab-ce#25916 and quite possibly https://gitlab.com/gitlab-com/infrastructure/issues/941.
How to reproduce
On a Linux system, do the following:
- Run
git clone --bare git@dev.gitlab.org:gitlab/gitlabhq.git
(it has to be in the current state) - Compile
go build pipetest.go
(attached below) - Download
upload-pack.txt
(attached below) - Run
./pipetest
strace output
23:25:54 openat(AT_FDCWD, "upload-pack.txt", O_RDONLY|O_CLOEXEC) = 4
23:25:54 write(1, "Copying data\n", 13Copying data
) = 13
23:25:54 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fce71419000
23:25:54 read(4, "008fwant 3ae86e8ba1b586d5524498a"..., 32768) = 32768
23:25:54 write(6, "008fwant 3ae86e8ba1b586d5524498a"..., 32768) = 32768
23:25:54 read(4, "97dc0b386bbcaa027beaa93e3410\n003"..., 32768) = 32768
23:25:54 write(6, "97dc0b386bbcaa027beaa93e3410\n003"..., 32768) = 32768
23:25:54 read(4, "82b0772f49\n0032have 3758f751906a"..., 32768) = 32768
23:25:54 write(6, "82b0772f49\n0032have 3758f751906a"..., 32768) = 32768
23:25:54 futex(0x53ec98, FUTEX_WAKE, 1) = 1
23:25:54 read(4, "e ae2c611a8d06de18fcc629320c4aab"..., 32768) = 32768
23:25:54 write(6, "e ae2c611a8d06de18fcc629320c4aab"..., 32768) = 32768
23:25:54 futex(0x53ec98, FUTEX_WAKE, 1) = 1
23:25:54 read(4, "dcb932b73e69421da4fa2ff8\n0032hav"..., 32768) = 32768
23:25:54 write(6, "dcb932b73e69421da4fa2ff8\n0032hav"..., 32768) = 32768
23:25:54 futex(0x53ec98, FUTEX_WAKE, 1) = 1
23:25:54 read(4, "a07604\n0032have 79440f699a6da2f7"..., 32768) = 32768
23:25:54 write(6, "a07604\n0032have 79440f699a6da2f7"..., 32768strace: Process 17619 detached
It looks the strace output sends 196608 / 205501 bytes before things stall out.
Possible fix
In https://gitlab.com/gitlab-org/gitlab-ce/issues/25916#note_21083065, I described that changing the io.Copy
to run in Goroutines appears to solve the problem.
/cc: @jacobvosmaer-gitlab