Skip to content

Report to Sentry for long Redis durations

Sean McGivern requested to merge report-to-sentry-for-long-redis-durations into master

We're seeing some very high (over 5 seconds) Redis durations in our logs, and we don't know what's causing them. This is an attempt to answer that by sending an event to Sentry when we detect this happening.

It's behind the feature flag report_on_long_redis_durations, which is off by default.

I tested this locally with these commands in a console:

[1] pry(main)> Feature.enable(:report_on_long_redis_durations)
  TRANSACTION (0.3ms)  BEGIN /*application:console,line:/lib/gitlab/database/schema_cache_with_renamed_table.rb:25:in `columns'*/
  Feature::FlipperFeature Load (0.5ms)  SELECT "features".* FROM "features" WHERE "features"."key" = 'report_on_long_redis_durations' ORDER BY "features"."id" ASC LIMIT 1 /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  TRANSACTION (0.3ms)  COMMIT /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  TRANSACTION (0.2ms)  BEGIN /*application:console,line:/lib/gitlab/database/schema_cache_with_renamed_table.rb:25:in `columns'*/
  Feature::FlipperGate Load (0.3ms)  SELECT "feature_gates".* FROM "feature_gates" WHERE "feature_gates"."feature_key" = 'report_on_long_redis_durations' /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  Feature::FlipperGate Load (0.4ms)  SELECT "feature_gates".* FROM "feature_gates" WHERE "feature_gates"."feature_key" = 'report_on_long_redis_durations' AND "feature_gates"."key" = 'boolean' /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  Feature::FlipperGate Create (0.5ms)  INSERT INTO "feature_gates" ("feature_key", "key", "value", "created_at", "updated_at") VALUES ('report_on_long_redis_durations', 'boolean', 'true', '2021-08-11 17:44:25.932371', '2021-08-11 17:44:25.932371') RETURNING "id" /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  TRANSACTION (1.4ms)  COMMIT /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  Feature::FlipperGate Load (0.2ms)  SELECT "feature_gates".* FROM "feature_gates" WHERE "feature_gates"."feature_key" = 'report_on_long_redis_durations' /*application:console,line:/ee/lib/ee/feature.rb:24:in `enable'*/
=> true
[2] pry(main)> Gitlab::Redis::SharedState.with { |r| r.mget('foo', 'foo') { sleep 1.3 } }
Sending event 7329ddbc74f64f158c01d6b3f7074ee0 to Sentry
Raven HTTP Transport connecting to https://sentry.gitlab.net
=> 1
[3] pry(main)> Gitlab::Redis::SharedState.with { |r| r.mget('foo', 'foo') { sleep 1 } }
=> 1
[4] pry(main)> Feature.disable(:report_on_long_redis_durations)
  TRANSACTION (0.3ms)  BEGIN /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  Feature::FlipperFeature Load (0.3ms)  SELECT "features".* FROM "features" WHERE "features"."key" = 'report_on_long_redis_durations' ORDER BY "features"."id" ASC LIMIT 1 /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  TRANSACTION (0.2ms)  COMMIT /*application:console,line:/lib/gitlab/database.rb:202:in `block in transaction'*/
  Feature::FlipperGate Load (0.3ms)  SELECT "feature_gates".* FROM "feature_gates" WHERE "feature_gates"."feature_key" = 'report_on_long_redis_durations' /*application:console,line:/ee/lib/ee/feature.rb:31:in `disable'*/
  TRANSACTION (0.1ms)  BEGIN /*application:console,line:/ee/lib/ee/feature.rb:31:in `disable'*/
  Feature::FlipperGate Destroy (0.4ms)  DELETE FROM "feature_gates" WHERE "feature_gates"."id" = 5 /*application:console,line:/ee/lib/ee/feature.rb:31:in `disable'*/
  TRANSACTION (1.3ms)  COMMIT /*application:console,line:/ee/lib/ee/feature.rb:31:in `disable'*/
  Feature::FlipperGate Load (0.3ms)  SELECT "feature_gates".* FROM "feature_gates" WHERE "feature_gates"."feature_key" = 'report_on_long_redis_durations' /*application:console,line:/ee/lib/ee/feature.rb:31:in `disable'*/
=> true
[5] pry(main)> Gitlab::Redis::SharedState.with { |r| r.mget('foo', 'foo') { sleep 1.3 } }
=> 1

As you can see, the first sent an event to Sentry and the second and third didn't. Also note that this isn't 5 seconds of Redis time - our measurements are also including Ruby code that's run alongside a command like this.

The example event in this case was https://sentry.gitlab.net/gitlab/sean-test/issues/2756773/?query=is%3Aunresolved, where the backtrace isn't very interesting. Hopefully it will be more interesting when we see these in production!

For gitlab-com/gl-infra/scalability#1183 (closed).

Edited by Sean McGivern

Merge request reports