Skip to content

Refactor Gitaly client stub to reduce the number of connections

What does this MR do and why?

For gitaly#4689 (closed)

Recently, GitLab Rails cache the client stubs by (storage, service). When creating a new Client object in Ruby, it dials and maintains a connection to the target host. It means a GitLab Rails process may open too many connections to Gitaly cluster. The client opens up to hundreds of connections per host. It's a waste of resources because all clients in the same host can share the same connection. It's the purpose of HTTP2 transportation from the first place. Each connection may only be used a couple of times in a life cycle of the Rails process. In addition, establishing a new connection has overhead. It may add hundreds of latency before any call.

Another advantage of creating one channel per host is the name resolution. When establishing a connection, Gitaly starts a watcher to resolve and watch for name resolution changes. This is particular true for service discovery over DNS. Depending on the platform, the watcher maybe lightweight (Go goroutine) or a thread (Ruby thread). We would rather not start too many of them. This is essential for Implement Praefect client-side load balancing a... (&8971 - closed).

This MR creates and caches gRPC channels by storage.

Screenshots or screen recordings

This MR does not introduce user-facing changes.

How to set up and validate locally

  • Setup GDK to start Gitaly on a port 9999.
  • Use command lsof -i -n -P | grep 9999 to watch for TCP connection to that host
  • At start up, Gitaly listens to port 9999
gitaly    94642 qmnguyen    8u  IPv4 0x56944f3e4364e1ed      0t0  TCP 127.0.0.1:9999 (LISTEN)
gitaly    94642 qmnguyen   10u  IPv4 0x56944f3e4364e1ed      0t0  TCP 127.0.0.1:9999 (LISTEN)
  • Spin up the web UI, test all features to activate client stubs. After a while, the command indicates there are two connections to Gitaly server.
gitaly    94642 qmnguyen    8u  IPv4 0x56944f3e4364e1ed      0t0  TCP 127.0.0.1:9999 (LISTEN)
gitaly    94642 qmnguyen   10u  IPv4 0x56944f3e4364e1ed      0t0  TCP 127.0.0.1:9999 (LISTEN)
gitaly    94642 qmnguyen   14u  IPv4 0x56944f3e49e36a9d      0t0  TCP 127.0.0.1:9999->127.0.0.1:56242 (ESTABLISHED)
gitaly    94642 qmnguyen   15u  IPv4 0x56944f3e49dea1ed      0t0  TCP 127.0.0.1:9999->127.0.0.1:56246 (ESTABLISHED)
ruby      95100 qmnguyen   46u  IPv6 0x56944f3e41b4d5b5      0t0  TCP 127.0.0.1:56242->127.0.0.1:9999 (ESTABLISHED)
ruby      95101 qmnguyen   46u  IPv6 0x56944f3e41b4e4b5      0t0  TCP 127.0.0.1:56246->127.0.0.1:9999 (ESTABLISHED)
  • Use ps aux command to inspect the process owner of the TCP connection. They are the workers of a puma process. It means each process uses only one connection to a single Gitaly storage now.
qmnguyen         95100   0.3  2.6 410557424 867424   ??  S    12:02PM   0:08.11 puma: cluster worker 0: 94649 [gitlab-puma-worker]
qmnguyen         95101   0.0  3.0 410660848 990608   ??  S    12:02PM   0:11.53 puma: cluster worker 1: 94649 [gitlab-puma-worker]

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Quang-Minh Nguyen

Merge request reports