Investigate PostUploadPack requests failing with status 'Unavailable'
https://log.gitlap.com/goto/5995c997b946461a907925d32f4fa636
I think these correspond to the following workhorse errors: https://sentry.gitlap.com/gitlab/gitlab-workhorse-gitlabcom/issues/18342/events/2074809/ ('write: broken pipe'). My theory is that the HTTP client goes away, workhorse cancels the gRPC call, the git upload-pack
process on the gitaly server fails to read from its stdin and then cmd.Wait()
in gitaly returns a non-zero exit code. But this is just a theory.
I also noticed that this error seems to happen on a small number of repositories, repeatedly. One of those repositories shows a clone every 10 minutes happening like clock work. https://log.gitlap.com/goto/c623871da0e8f0fbb9f95dda09665e97 Not all of these clones fail but some do.
Possible things we can do now:
- try to reproduce the error (I have had no luck so far in GDK)
- change the PostUploadPack handler in Gitaly so that if the request context is cancelled we don't return any cmd.Wait() error
- furthermore (?), don't log 'context cancelled' as an error at all but increment a counter instead? Because if we just change the error code we don't make the noise from these errors go away.