Rack Attack redis calls showing up in the slowlog
Problems
It looks like since we started tracking requests for Rack::Attack, the redis calls have been starting to show up in the slowlog: https://log.gprd.gitlab.net/goto/1fc5d96e99ca3bdf82074228239174cb
. These are the SET
and INCRBY
, which are already O(1)
so I think this is caused by simultaneous calls racing each other? We are still not sure why. Looking at the Redis-related metrics, there is no way to differentiate RackAttack Redis' usage from the rest. As RackAttack is the gatekeeper for all Web/Api/GraphQL requests, it's importance is high enough to have a special motoring treatment. I think there are something we'll be interested in:
- The number of Redis calls RackAttack makes on every request.
- The total time spent for Redis at RackAttack layer on every request.
As RackAttack eventually call Redis for every single request, those information should be tracked for legitimate requests too. After we got the numbers, we may have a clear insight about what's next.
Proposals
Although we already have Redis metrics, there are some problems with it:
- Those metrics are broken down by Redis instance. Right now, we are still using a shared instance
- We'll need to add a new label into that metric. Apart from RackAttack, I'm not sure whether there is another component beneficial by this change.
- It's hard to add a special treatment into recent Redis instrumentation. We are using an interceptor to patch the Redis client instance directly. Adding rack attack information into that may require a global state (in SafeRequestStore?) right before issuing a Redis query. It's error-prone and seems to be a rabbit hole.
RackAttack supports cache store configuration. We just need to create a new cache store instance, and instrument on that instance. No need to touch the recent Redis instrumentation system. The effort is low and there isn't any risk for this approach it's absolutely isolated.
class RackAttackStore
def write(name, values, options = {})
# Record rack attack usage here
# Blah blah
Rails.cache.write(name, values, options)
end
end
Rack::Attack.cache.store = RackAttackCacheStore
Some useful information at the moment:
rack_attack_redis_calls
rack_attack_redis_duration
After we have the instrumentation data, we'll need to integrate into the structured logs. If a request is rejected by RackAttack, there won't be any request log. Rack Attack outputs an auth logging item instead. As a sequence, we'll need to add the instrumentation information (possibly all payloads provided by Gitlab::InstrumentationHelper.add_instrumentation_data
) into this kind of logs as well.