Skip to content

Add counter for cross-slot requests count

Sylvester Chin requested to merge schin1-track-crossslot-by-commands into master

What does this MR do and why?

This MR adds a counter gitlab_redis_client_cross_slot_requests_total for cross-slot commands with the cmd label. Note that it is currently only tracking commands without an allow_cross_slot_commands scope. This helps to track existing cross-slot operations that "slipped through the cracks" and are not covered with a allow_cross_slot_commands scope.

See gitlab-com/gl-infra/scalability#2005 (closed)

How this counter can be useful: we can track on thanos 1 plot per redis shard. To switch over cleanly to Redis Cluster, the counter needs to be zero. Having cmd helps us to narrow down areas where more work is needed.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

  1. Apply the following diff
diff --git a/app/models/active_session.rb b/app/models/active_session.rb
index b16c4a2b353d..e306c99df04e 100644
--- a/app/models/active_session.rb
+++ b/app/models/active_session.rb
@@ -83,7 +83,6 @@ def self.set(user, request)
         is_impersonated: request.session[:impersonator_id].present?
       )

-      Gitlab::Instrumentation::RedisClusterValidator.allow_cross_slot_commands do
         redis.pipelined do |pipeline|
           pipeline.setex(
             key_name(user.id, session_private_id),
@@ -91,18 +90,17 @@ def self.set(user, request)
             active_user_session.dump
           )

-          # Deprecated legacy format - temporary to support mixed deployments
-          pipeline.setex(
-            key_name_v1(user.id, session_private_id),
-            expiry,
-            Marshal.dump(active_user_session)
-          )
-
-          pipeline.sadd?(
-            lookup_key_name(user.id),
-            session_private_id
-          )
-        end
+        # Deprecated legacy format - temporary to support mixed deployments
+        pipeline.setex(
+          key_name_v1(user.id, session_private_id),
+          expiry,
+          Marshal.dump(active_user_session)
+        )
+
+        pipeline.sadd?(
+          lookup_key_name(user.id),
+          session_private_id
+        )
       end
     end
   end
@@ -242,10 +240,8 @@ def dump
   private_class_method def self.raw_active_session_entries(redis, session_ids, user_id)
     return {} if session_ids.empty?

-    found = Gitlab::Instrumentation::RedisClusterValidator.allow_cross_slot_commands do
-      entry_keys = session_ids.map { |session_id| key_name(user_id, session_id) }
-      session_ids.zip(redis.mget(entry_keys)).to_h
-    end
+    entry_keys = session_ids.map { |session_id| key_name(user_id, session_id) }
+    found = session_ids.zip(redis.mget(entry_keys)).to_h

     found.compact!
     missing = session_ids - found.keys
diff --git a/lib/gitlab/instrumentation/redis_base.rb b/lib/gitlab/instrumentation/redis_base.rb
index 404eed94302a..cc4f9aec5059 100644
--- a/lib/gitlab/instrumentation/redis_base.rb
+++ b/lib/gitlab/instrumentation/redis_base.rb
@@ -79,8 +79,6 @@ def redis_cluster_validate!(commands)
           ::Gitlab::Instrumentation::RedisClusterValidator.validate!(commands) if @redis_cluster_validation
           true
         rescue ::Gitlab::Instrumentation::RedisClusterValidator::CrossSlotError
-          raise if Rails.env.development? || Rails.env.test? # raise in test environments to catch violations
-
           false
         end
  1. Start/restart gdk
  2. Open 2 localhost:3000 sessions
  3. Check metrics (set web_exporter.enabled in config/gitlab.yml to true)
curl localhost:8083/metrics 2> /dev/null   | grep gitlab_redis_client_cross_slot_requests_total

# HELP gitlab_redis_client_cross_slot_requests_total Multiprocess metric
# TYPE gitlab_redis_client_cross_slot_requests_total counter
gitlab_redis_client_cross_slot_requests_total{allowed_caller="unknown",cmd="pipelined",storage="sessions"} 30

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Sylvester Chin

Merge request reports