Refactor Gitaly client stub to reduce the number of connections
What does this MR do and why?
Recently, GitLab Rails cache the client stubs by (storage, service). When creating a new Client object in Ruby, it dials and maintains a connection to the target host. It means a GitLab Rails process may open too many connections to Gitaly cluster. The client opens up to hundreds of connections per host. It's a waste of resources because all clients in the same host can share the same connection. It's the purpose of HTTP2 transportation from the first place. Each connection may only be used a couple of times in a life cycle of the Rails process. In addition, establishing a new connection has overhead. It may add hundreds of latency before any call.
Another advantage of creating one channel per host is the name resolution. When establishing a connection, Gitaly starts a watcher to resolve and watch for name resolution changes. This is particular true for service discovery over DNS. Depending on the platform, the watcher maybe lightweight (Go goroutine) or a thread (Ruby thread). We would rather not start too many of them. This is essential for Implement Praefect client-side load balancing a... (&8971 - closed).
This MR creates and caches gRPC channels by storage.
Screenshots or screen recordings
This MR does not introduce user-facing changes.
How to set up and validate locally
- Setup GDK to start Gitaly on a port 9999.
- Use command
lsof -i -n -P | grep 9999
to watch for TCP connection to that host - At start up, Gitaly listens to port 9999
gitaly 94642 qmnguyen 8u IPv4 0x56944f3e4364e1ed 0t0 TCP 127.0.0.1:9999 (LISTEN)
gitaly 94642 qmnguyen 10u IPv4 0x56944f3e4364e1ed 0t0 TCP 127.0.0.1:9999 (LISTEN)
- Spin up the web UI, test all features to activate client stubs. After a while, the command indicates there are two connections to Gitaly server.
gitaly 94642 qmnguyen 8u IPv4 0x56944f3e4364e1ed 0t0 TCP 127.0.0.1:9999 (LISTEN)
gitaly 94642 qmnguyen 10u IPv4 0x56944f3e4364e1ed 0t0 TCP 127.0.0.1:9999 (LISTEN)
gitaly 94642 qmnguyen 14u IPv4 0x56944f3e49e36a9d 0t0 TCP 127.0.0.1:9999->127.0.0.1:56242 (ESTABLISHED)
gitaly 94642 qmnguyen 15u IPv4 0x56944f3e49dea1ed 0t0 TCP 127.0.0.1:9999->127.0.0.1:56246 (ESTABLISHED)
ruby 95100 qmnguyen 46u IPv6 0x56944f3e41b4d5b5 0t0 TCP 127.0.0.1:56242->127.0.0.1:9999 (ESTABLISHED)
ruby 95101 qmnguyen 46u IPv6 0x56944f3e41b4e4b5 0t0 TCP 127.0.0.1:56246->127.0.0.1:9999 (ESTABLISHED)
- Use
ps aux
command to inspect the process owner of the TCP connection. They are the workers of a puma process. It means each process uses only one connection to a single Gitaly storage now.
qmnguyen 95100 0.3 2.6 410557424 867424 ?? S 12:02PM 0:08.11 puma: cluster worker 0: 94649 [gitlab-puma-worker]
qmnguyen 95101 0.0 3.0 410660848 990608 ?? S 12:02PM 0:11.53 puma: cluster worker 1: 94649 [gitlab-puma-worker]
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.