Skip to content

Add Redis instrumentation before `ensure_connected`

Sylvester Chin requested to merge sc1-instrument-before-ensureconnected into master

What does this MR do and why?

This MR adds Redis instrumentation before ensure_connected. This allows us to check for connection-based errors like EOFError and EPIPE before they are rescued and retried in ensure_connected.

We introduce gitlab_redis_client_connection_exceptions_total to avoid creating noise to the various Redis's SLI error rates.

See discussion in gitlab-com/gl-infra/scalability#2564 (comment 1665334335)

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. Open a gdk rails console. Start by running:
require 'prometheus/client/formats/text.rb'
rc = Gitlab::Redis::Queues.with {|c| c }
rc.ping # "PONG"
  1. Set redis timeout to 5s
gdk redis-cli -n 1 config set timeout 5
  1. Run another rc.ping and check that the stats changed (increased by 1).
[6] pry(main)> Prometheus::Client::Formats::Text.marshal_multiprocess.split("\n").filter{|x| x.include?("gitlab_redis_client_connection_exceptions_total")}
=> ["# HELP gitlab_redis_client_connection_exceptions_total Multiprocess metric",
 "# TYPE gitlab_redis_client_connection_exceptions_total counter",
 "gitlab_redis_client_connection_exceptions_total{exception=\"Redis::ConnectionError\",storage=\"queues\"} 2",
[7] pry(main)> rc.ping
=> "PONG"
[8] pry(main)> Prometheus::Client::Formats::Text.marshal_multiprocess.split("\n").filter{|x| x.include?("gitlab_redis_client_connection_exceptions_total")}
=> ["# HELP gitlab_redis_client_connection_exceptions_total Multiprocess metric",
 "# TYPE gitlab_redis_client_connection_exceptions_total counter",
 "gitlab_redis_client_connection_exceptions_total{exception=\"Redis::ConnectionError\",storage=\"queues\"} 3",

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Sylvester Chin

Merge request reports