Add client GRPC logging for Gitaly Ruby calls
At present, there are several hosts in the GitLab.com Gitaly fleet with very bad SLIs, particularly for gitalyruby
access.
An example of this type of issue is https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10953#note_389606586
One of the big problems I'm finding in diagnosing this issue is that, from a metrics point of view, gitalyruby is fairly opaque.
We do have GRPC client metrics for communicating with gitalyruby, but not much more.
In investigating issues such as https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10953#note_389606586, it would be really helpful to be able to know if the error rates come from a single Gitalyruby process or all of the processes simultaneously. At present, it's not possible to know.
In other to investigate further we need either:
- Distributed Tracing enabled on GitLab.com: &210 (closed)
- Gitalyruby is already instrumented for Distributed Tracing. This would allow us to understand which processes are affected
- I'm unsure of when this will be delivered
- Gitalyruby Request Logging
- Optionally configure GRPC client logger in Go, writes logs alongside the main Gitaly access logs
- Downside is more logging
- Upside: easy to do
- Additional logging metrics
- With 50+ Gitaly servers, we would be hard pressed to increase the cardinality on these metrics unfortunately
Proposal
I propose we implement option 2, adding the ability to enable client GRPC logging requests in Gitaly, to Gitalyruby:
clientside GRPC interceptor to GRPC calls from gitaly-go to gitaly-ruby
We should make sure that the logs include the child process id, correlation id and set gitaly-ruby
as the type.