gitaly-ruby does not report underlying HTTP errors when proxy errors occur
Customer ticket: https://gitlab.zendesk.com/agent/tickets/174915
This issue caused a lot of grief this past weekend, and many hours were spent troubleshooting the issue. Quick summary:
- When trying to commit a file via the Web UI, the request would timeout with a
DeadlineExceedederror. However, pushes via the command-line worked fine. - The customer had
https_proxyset with an HTTP, not HTTPS, proxy. - As a result, all gRPC calls made by
gitaly-rubywere proxied, which was unexpected. - Fixing this required adding the Gitaly host IP to the
no_proxylist. - The proxy server itself was returning a clear error message, but this message was nowhere to be found in the Gitaly logs. We only found this in the
straceoutput.
To reproduce this problem, set up something like:
gitaly['env'] = {
'https_proxy' => 'http://proxy.example.com',
'http_proxy' => 'http://proxy.example.com',
'no_proxy' => 'gitlab.example.com'
}
Note that https_proxy has a proxy prefixed with http:// instead of https://. This has an unintended side effect of causing gitaly-ruby to proxy all gRPC calls through the proxy, even when gitaly-ruby is trying to talk to its parent Gitaly process.
If an HTTPS proxy is used instead, gRPC will ignore this setting since HTTPS proxies are not supported (https://github.com/grpc/grpc/issues/20939). Instead, you get this warning message at startup:
{
"level":"info",
"msg":"E1004 06:12:06.742912935 5366 http_proxy.cc:62] 'https' scheme not supported in proxy URI",
"supervisor.args":[
"bundle",
"exec",
"bin/ruby-cd",
"/var/opt/gitlab/gitaly",
"/opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby",
"5255",
"/tmp/gitaly-ruby148060389/socket.1"
],
"supervisor.name":"gitaly-ruby.1",
"time":"2020-10-04T06:12:06Z"
}
Using tinyproxy and allowing all CONNECT calls, you can see which gRPC calls are being made by gitaly-ruby during the UserCommitFiles RPC:
This looks like it originates from https://gitlab.com/gitlab-org/gitaly/blob/5c284bc03e942ab4aed269d316c84b26b1e91aaf/ruby/lib/gitlab/git/operation_service.rb#L82.
sequenceDiagram
autonumber
Client->>+Rails: POST /group/project/master/README.md
Rails->>+Gitaly: UserCommitFiles
Gitaly->>+gitaly_ruby: UserCommitFiles
gitaly_ruby->>proxy: CONNECT
gitaly_ruby->>proxy: HasLocalBranches?
proxy->>+Gitaly: HasLocalBranches?
gitaly_ruby->>proxy: RefExists?
proxy->>+Gitaly: RefExists?
gitaly_ruby->>gitaly_hooks: pre-receive
gitaly_hooks->>Rails: POST /api/v4/internal/allowed
Rails->>gitaly_hooks: 200 OK
gitaly_hooks->>gitaly_ruby: OK
gitaly_ruby->>gitaly_ruby: update-ref
Step 4 is where the proxying of RPC happens, and that shouldn't be happening.
Ideas for improvement:
- Log the underlying gRPC error for gitaly-ruby somewhere. (this issue)
- Document the need to add the current host IP to the
no_proxylist if an HTTP proxy is used in HTTPS. - Add some Gitaly check test (similar to the
gitlab-shell/bin/check) that tests for:- Connectivity with the GitLab API
- Have gitaly-ruby make a loopback request back to the Gitaly server
